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Basic properties of vectors, matrices, determinants, eigenvalues and eigenvectors are 
discussed. Then, applications of matrices and determinants to various areas of sta- 
tistical problems such as principal components analysis, model building, regression 
analysis, canonical correlation analysis, design of experiments etc. are examined. Ap- 
plications of vector/matrix derivatives in the simplification of Taylor expansions of 
functions of many real scalar variables are considered. Jacobians of matrix transfor- 
mations of real-valued scalar functions of matrix argument, maxima/minima prob- 
lems, optimizations of linear forms, quadratic forms, bilinear forms with linear and 
quadratic constraints are examined. Matrix sequences and series, convergence of ma- 
trix series etc. and applications in physical sciences, chemical sciences, social sci- 
ences, input-analysis, linear programming problem, non-linear least squares and dy- 
namic programming problems etc. are studied in this book. 

Each topic is motivated by real-life situations and each concept is illustrated with 
examples and counter examples. The book is class-tested since 1999. It is written with 
the experience of teaching fifty years in various universities around the world. The first 
three Modules of the Centre for Mathematical and Statistical Sciences (CMSS)are com- 
bined to make this book. These Modules are used for intensive undergraduate mathe- 
matics training camps of CMSS. Each camp is a 10-day intensive training course with 
40 hours of lectures and 40 hours of problem-solving sessions. Thirty such camps are 
already conducted by CMSS. Only high school level mathematics is assumed. The book 
is written as a self-study material. Each topic is brought from fundamentals to the se- 
nior undergraduate to graduate level. Usual doubts of the students on various topics 
are answered in the book. 

Since 2004, the material in this book was made available to UN-affiliated Re- 
gional Centres for Space Science and Technology Education, located in India, China, 
Morocco, Nigeria, Jordan, Brazil, and Mexico (http://www.unoosa.org/oosa/en/ 
ourwork/psa/regional-centres/index.html). 

Since 1988 the material was taken into account for the development of educa- 
tion curricula in the fields of remote sensing and geographic information systems, 
satellite meteorology and global climate, satellite communications, space and atmo- 
spheric science, and global navigation satellite systems (http://www.unoosa.org/ 
oosa/en/ourwork/psa/regional-centres/study  curricula.html). 

As such the material was considered to be a prerequisite for applications, teach- 
ing, and research in space science and technology. It was also a prerequisite for the 
nine-months post-graduate courses in the five disciplines of space science and tech- 
nology, offered by the Regional Centres on an annual basis to participants from all 194 
Member States of the United Nations. 

Since 1991, whenever suitable at the research level, the material in this book was 
utilized in lectures in a series of annual workshops and follow-up projects of the so- 
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called Basic Space Science Initiative of the United Nations (http://www.unoosa.org/ 
oosa/en/ourwork/psa/bssi/index.html). 

As such the material was considered a prerequisite for teaching and research in 
astronomy and physics. 

RIPPLE SIGHTING The cosmic dance of two black holes warped spacetime as the 
pair spiraled inward and merged, creating gravitational waves (illustration below). 
Advance Laser Interferometer Gravitational-Wave Observatory (LIGO) detected these 
ripples, produced by black holes eight and 14 times the mass of the sun, on Decem- 
ber 26, 2015. Einstein’s theory of general relativity was 100 years old in 2015. It has been 
very important in applications such as GPS (GNSS), and tremendously successful in 
understanding astrophysical systems like black holes. Gravitational waves, which are 
ripples in the fabric of space and time produced by violent events in the distant uni- 
verse — for example, by the collision of two black holes or by the cores of supernova 
explosions — were predicted by Albert Einstein in 1916 as a consequence of his general 
theory of relativity. Gravitational waves are emitted by accelerating masses much in 
the same way electromagnetic waves are produced by accelerating charges, such as 
radio waves radiated by electrons accelerating in antennas. As they travel to Earth, 
these ripples in the space-time fabric carry information about their violent origins 
and about the nature of gravity that cannot be obtained by traditional astronomical 
observations using light. Gravitational waves have now been detected directly. Scien- 
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tists do, however, have great confidence that they exist because their influence on a 
binary pulsar system (two neutron stars orbiting each other) has been measured ac- 
curately and is in excellent agreement with the predictions. Directly detecting gravi- 
tational waves has confirmed Einstein's prediction in a new regime of extreme rela- 
tivistic conditions, and open a promising new window into some of the most violent 
and cataclysmic events in the cosmos. The GNSS education curricula provides oppor- 
tunities to teach navigation and do research in astrophysics (basic space science). The 
development of the education curricula (illustrated above) started in 1988 at UN Head- 
quarters in New York, the specific GNSS curriculum emanated only in 1999 after the 
UNISPACE III Conference, held at and hosted by the United Nations at Vienna. 

Usually students from other areas, other than mathematics, are intimidated by 
seeing theorems and proofs. Hence no such phrase as “theorem” is used in the book. 
Main results are called “results” and are written in bold so that the material will be 
user-friendly. 

This book can be used as a textbook for a beginning undergraduate level course 
on vectors, matrices and determinants, and their applications, for students from all 
disciplines. 


Preface 


The basic material in this book originated from a course given by the first author at 
the University of Texas at El Paso in 1998-1999 academic year. Students from math- 
ematics, engineering, biology, economics, physics and chemistry were in the class. 
The textbook assigned to the course did not satisfy the students from any of the dis- 
ciplines, including mathematics. Hence Dr Mathai started developing a course from 
fundamentals, assuming no background, with lots of examples and counter exam- 
ples taken from day to day life. All sections of the students enjoyed the course. Dr 
Mathai gave courses on calculus and linear algebra and for both of these courses he 
developed his own materials in close interaction with students. The El Paso experi- 
ment was initially for one semester only but, due to the popularity, extended to more 
semesters. 

During 2000 to 2006 these notes were developed into CMSS Modules and based 
on these Modules, occasional courses were given for teachers and students at various 
levels in Kerala, India, as per requests from teachers. From 2007 onward CMSS became 
a Department of Science and Technology, Government of India centre for mathemat- 
ical and statistical sciences. Modules in other areas were also developed during this 
period, and by 2014, ten Modules were developed. 

As a Life Member of CMSS, the second author is an active participant of all pro- 
grams at CMSS, including the undergraduate mathematics training camps, Ph.D train- 
ing etc. and he is also a frequent visitor to CMSS to participate in and contribute to 
various activities. 

Chapter 1 is devoted to all basic properties of vectors as ordered set of real num- 
bers, Each definition is motivated by real-life examples. After introducing major prop- 
erties of vectors with the real elements, vectors in the complex domain are considered 
and more rigorous definitions are introduced. Chapter 1 ends with Gram-Schmidt or- 
thogonalization process. 

Chapter 2 deals with matrices. Again, all definitions and properties are introduced 
from real-life situations. Roles of elementary matrices and elementary operations in 
solving linear equations, checking consistency of linear systems, checking linear de- 
pendence of vectors, evaluating rank of a matrix, canonical reductions of quadratic 
and bilinear forms, triangularizations and diagonalizations of matrices, computing 
inverses of matrices etc. are highlighted. 

Chapter 3 deals with determinants. An axiomatic definition is introduced. Various 
types of expansions of determinants are given. Role of elementary matrices in evalu- 
ating determinants is highlighted. This chapter melts into Chapter 4 on eigenvalues 
and eigenvectors and their properties. 

Chapters 5 and 6 are on applications of matrices and determinants to various 
disciplines. Applications to maxima/minima problems, constrained maxima/minima, 
optimization of linear, quadratic and bilinear forms, with linear and quadratic con- 
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straints are considered. For each optimization, at least one practical procedure such 
as principal components analysis, canonical correlation analysis, regression analysis 
etc. is illustrated. Some additional topics are also developed in Chapter 6. Matrix poly- 
nomials, matrix sequences and series, convergence, norms of matrices, singular value 
decomposition of matrices, simultaneous reduction of matrices to diagonal forms etc. 
are also discussed in Chapter 6. 


A. M. Mathai 
14th March 2017 H. J. Haubold 
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1 Vectors 


1.0 Introduction 


We start with vectors as ordered sets in order to introduce various aspects of these 
objects called vectors and the different properties enjoyed by them. After having dis- 
cussed the basic ideas, a formal definition, as objects satisfying some general condi- 
tions, will be introduced later on. Several examples from various disciplines will be 
introduced to indicate the relevance of the concepts in various areas of study. As the 
students may be familiar, a collection of well-defined objects is called a set. For ex- 
ample (2, a, B} is a set of 3 objects, the objects being a number 2, a Greek letter a and 
the capital letter B. Sets are usually denoted by curly brackets {list of objects}. Each 
object in the set is called an element of the set. Let the above set be denoted by S, then 
S = {2,a, B}. Then 2 is an element of S. It is usually written as 2 € S (2in S or 2 is an 
element of S). Thus we have 


S={2,a,B}, 2€S,a€S, BeS,7¢S,-y¢S (1.0.1) 


where ¢ indicates “not in”. That is, 7 is not in S and -y (gamma) is not an element of S. 
For a set, the order in which the elements are written is unimportant. We could 
have represented S equivalently as follows: 


S = {2, a, B} = {2, B, a} = (a, 2, B] 
{a, B, 2} = {B, 2, a} = {B, a, 2} (1.0.2) 


because all of these sets contain the same objects and hence they represent the same 
set. Now, we consider ordered sets. In (1.0.2) there are 6 ordered arrangements of the 
3 elements. Each permutation (rearrangement) of the objects gives a different ordered 
set. With a set of n distinct objects we can have a total of n! = (1)(2) ... (n) ordered sets. 


1.1 Vectors as ordered sets 


For the time being we will define a vector as an ordered set of objects. More rigorous 
definitions will be given later on in our discussions. Vectors or these ordered sets will 
be denoted by ordinary brackets (ordered list of elements) or by square brackets [or- 
dered list of elements]. For example, if the ordered sequences are taken from (1.0.2) 
then we have six vectors. If these are denoted by V}, V5, ..., Vc respectively, then we 
have 


V,=(2,a,B), V,=(2,B,a), V;-(a,2,B), 
V, =(0,B,2), V;=(B,2,a), Ve =(B,a,2). 
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We could have also represented these by square brackets, that is, 
V, =(2,a,B], .., Ve=[B,a,2]. (1.1.1) 


As a convention, we will use either all ordinary brackets (-) or all square brackets [-] 
when we discuss a given collection of vectors. The two notations will not be mixed up 
in the same collection. We could have also written the ordered sequences as columns, 
rather than as rows. For example, 


2 B 2 B 
U,=|a}, s; Ug-|a| or U,=| a |, , Us=la (1.1.2) 
B 2 B 2 


also represent the same collection or ordered sets or vectors. In (1.1.1) they are written 
as row vectors whereas in (1.1.2) they are written as column vectors. 


Definition 1.1.1 (An n-vector). It is an ordered set of n objects written either as a row 
(a row n-vector) or as a column (a column n-vector). 


Example 1.1.1 (Stock market gains). A person has invested in 4 different stocks. Tak- 
ing the January 1, 1998 as the base the person is watching the gain/loss, from this base 
value, at the end of each week. 


Stock1 Stock2 Stock3 Stock4 


Week 1 100 150 -50 50 
Week 2 50 —50 70 -50 
Week3 -150 —100 —20 0 


The performance vector at the end of week 1 is then (100, 150, —50,50), a negative num- 
ber indicating the loss and a positive number denoting a gain. The performance vector 
of stock 1 over the three weeks is | E . Observe that we could have also written weeks 
as columns and stocks as rows instead of the above format. Note also that for each el- 
ement the position where it appears is relevant, in other words, the elements above 
are ordered. 


Example 1.1.2 (Consumption profile). Suppose the following are the data on the food 
consumption of a family in a certain week, where q denotes quantity (in kilograms) 
and p denotes price per unit (per kilogram). 


Beef Pork Chicken Vegetables cereals 


q 10 15 20 10 5 
p $200 $1.50 $0.50 $1.00 $3.45 


The vector of quantities consumed is [10, 15, 20, 10, 5] and the price vector is [2.00, 1.50, 
0.50, 1.00, 3.45]. 


1.1 Vectors as ordered sets — 3 


Example 1.1.3 (Discrete statistical distributions). If a discrete random variable takes 
values x4, X2, ..., x, with probabilities p, ..., p, respectively where p; > 0, i- 1,...,n, 
Pı +-+ +pn = 1 then this distribution can be represented as follows: 


x-values XQ XX 
probabilities p4 p ... Dn 


As an example, if x takes the values 0,1, -1, (such as a gambler gains nothing, gains 
one dollar, loses one dollar) with probabilities Z, D i respectively then the distribution 
can be written as 


x-values 
probabilities 


NIF O 
Ale e 
Ape 


Here the observation vector is (0,1, —1) and the corresponding probability vector is 
(5; ie 1) Note that when writing the elements of a vector, the elements may be sepa- 
rated by sufficient spaces, or by commas if there is possibility of confusion. Any vector 
(D, ...,p,) such that p; > 0,i=1,...,n, pj ++ +p, =1is called a discrete probability 


distribution. 


Example 1.1.4 (Transition probability vector). Suppose at El Paso, Texas, there are 
only two possibilities for a September day. It can be either sunny and hot or cloudy 
and hot. Let these be denoted by S (sunny) and C (cloudy). A sunny day can be fol- 
lowed by either a sunny day or a cloudy day and similarly a cloudy day can follow 
either a sunny or a cloudy day. Suppose that the chances (transition probabilities) are 
the following: 


S C 


S 0.95 0.05 
C 0.90 0.10 


Then for a sunny day the transition probability vector is (0.95, 0.05) to be followed by 
a sunny and a cloudy day respectively. For a cloudy day the corresponding vector is 
(0.90, 0.10). 


Example 1.1.5 (Error vector). Suppose that an automatic machine is filling 5 kg bag of 
potatoes. The machine is not allowed to cut or chop to make the weight exactly 5 kg. 
Naturally, if one such bag is taken then the actual weight can be less than or greater 
than or equal to 5 kg. Let € denote the error = observed weight minus the expected 
weight(5 kg). [One could have defined "error" as expected value minus the observed 
value]. Suppose 4 such bags are selected and weighed. Suppose the observation vec- 
tor, denoted by X, is 


X = (5.01, 5.10, 4.98, 4.92). 
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Then the error vector, denoted by e, is 


€ = (0.01, 0.10, —0.02, -0.08) 
= (5.01 — 5.00, 5.10 — 5.00, 4.98 — 5.00, 4.92 — 5.00). 


Note that we could have written both X and e as column vectors as well. 


Example 1.1.6 (Position vector). Suppose a person walks on a straight path (horizon- 
tal) for 4 miles and then along a perpendicular path to the left for another 6 miles. If 
these distances are denoted by x and y respectively then her position vector is, taking 
the starting points as the origin, 


(x,y) = (4, 6). 


Example 1.1.7 (Vector of partial derivatives). Consider f(x,,...,x,,), a scalar function 
of n real variables x, ...,x,. As an example, 


f 08,X3,X3) = 3X? + X3 + xj - 2xax; + 5XqX4 — 2x, +7. 


Here n - 3 and there are 3 variables in f. Consider the partial derivative operators 


a 2 zs that is, D operating on f means to differentiate f with respect to x, treat- 
a 


ing x, and x3 as constants. For example, A operating on the above f gives 


Ll— = 6X, - 2x, + 5X% - 2. 
Consider the partial differential operator 
o | ( ð ð ) 
OX Now "Ox,/^ 


Then 2 operating on f gives the vector 


atu) 


For the above example, 
of _ ( of of of ) 
OX NOx, Ox, Ox, 
= (6X, — 2X; + 5x4 - 2, 2X; — 2x1, 2X3 + 5X4). 


Example 1.1.8 (Students' grades). Suppose that Miss Gomez, a first year student 
at UTEP, is taking 5 courses, Calculus I (course 1), Linear Algebra (course 2),..., 
(course 5). Suppose that each course requires 2 class tests, a set of assignments to 
be submitted and a final exam. Suppose that Miss Gomez' performance profile is the 
following (all grades in percentages): 
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course 1 course2 course3 course4 course5 


test 1 80 85 80 90 95 
test 2 85 85 85 95 100 
assignments 100 100 100 100 100 
final exam 90 95 90 92 95 


Then for example, her performance profiles on courses 1 and 4 are 


80 90 
85 95 
100! 24 | 100 
90 92 


respectively. Her performances on all courses is the vector (80, 85, 80, 90, 95) for test 1. 


Example 1.1.9 (Fertility data). Fertility of women is often measured in terms of the 
number of children produced. Suppose that the following data represent the average 
number of children in a particular State according to age and racial groups: 


group1 group2 group3 group4 


< 16 1 0.8 15 0.5 
16 to <18 1 1 0.8 0.9 
18 to < 35 4 2 3 2 
35 to <50 1 0 2 0 

>50 0 0 1 0 


The first row vector in the above table represents the performance of girls 16 years or 
younger over the 4 racial groups. Column 2 represents the performance of racial group 
2 over the age groups, and so on. 


Example 1.1.10 (Geometric probability law). Suppose that a person is playing a game 
of chance in a casino. Suppose that the chance of winning at each trial is 0.2 and that 
oflosing is 0.8. Suppose that the trials are independent of each other. Then the person 
can win at the first trial, or lose at the first trial and win at the second trial, or lose at 
the first two trials and win at the third trial, and so on. Then the chance of winning at 
the x-th trial, x 2 1,2,3, ... is given by the vector 


[0.2, (0.8)(0.2), (0.8)?(0.2), (0.8)? (0.2), ...]. 


It is an n-vector with n = +oo. Note that the number of ordered objects, representing a 
vector, could be finite or infinitely many (countable, that is one can draw a one-to-one 
correspondence to the natural numbers 1,2, 3, ...). 


In Example 1.1.1 suppose that the gains/loses were in US dollars and suppose that 
the investor was a Canadian and she would like to convert the first week's gain/loss 
into Canadian dollar equivalent. Suppose that the exchange rate is US$ 1=CA$ 1.60. 
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Then the first week’s performance is available by multiplying each element in the vec- 
tor by 1.6. That is, 


1.6(100, 150, —50, 50) = ((1.6)(100), (1.6)(150), (1.6)(—50), (1.6)(50)) 
= (160, 240, -80, 80). 


Another example of this type is that someone has a measurement vector in feet and 
that is to be converted into inches, then each element is multiplied by 12 (one foot = 
12 inches), and so on. 


Definition 1.1.2 (Scalar multiplication of a vector). Let c be a scalar, a 1-vector, and 
U =(uj,...,U,) an n-vector. Then the scalar multiple of U, namely cU, is defined as 


CU = (c4, ..., cus). (1.1.3) 


As a convention the scalar quantity c is written on the left of U and not on the right, 
that is, not as Uc but as cU. As numerical illustrations we have 


1 -3 1 0 1 1 
1 2 
BEE der = Oe custos 
2 -6 2 0 2 1 
4(2, -1) = (8, —4). 


In Example 1.1.1 if the total (combined) gain/loss at the end of the second week is 
needed then the combined performance vector is given by 


(100 + 50,150 — 50,—50 + 70,50 — 50) = (150,100, 20, 0). 


If the combined performance of the first three weeks is required then it is the above 
vector added to the third week’s vector, that is, 


(150, 100, 20, 0) + (—150, —100, —20, 0) = (0, 0, 0,0). 


Definition 1.1.3 (Addition of vectors). Let a = (a;,...,a,) and b = (b,,...,b,,) be two 
n-vectors. Then the sum is defined as 


a+b=(a,+b,...,d,+Dd,), (1.1.4) 
that is, the vector obtained by adding the corresponding elements. 


Note that vector addition is defined only for vectors of the same category and or- 
der. Either both are row vectors of the same order or both are column vectors of the 
same order. In other words, if U is an n-vector and V is an m-vector then U + V is not 
defined unless m = n, and further, both are either row vectors or column vectors. 
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Definition 1.1.4 (A null vector). A vector with all its elements zeros is called a null 
vector and it is usually denoted by a big O. 


In Example 1.1.1 the combined performance of the first 3 weeks is a null vector. In 
other words, after the first 3 weeks the performance is back to the base level. From the 
above definitions the following properties are evident. If U, V, W are three n-vectors 
(either all row vectors or all column vectors) and if a, b, c are scalars then 


U+V=V+U; U+(V+W)=(U+V)+W 
U-V=U+(-1)V; U+0=0+U=U; U-U=0O; 
a[bU + cV]=abU +acV = b(aU) + c(aV). (1.1.5) 


Some numerical illustrations are the following: 


1 0 0 2 0 0 
2}0/-3] 1]+)/0/;/=] O}+]-3/4+]0 
-1 -2 0 -2 6 0 
24-040 2 

=| 0-3+0 |=]-31]; 
-24640 4 


(1, —7) + 6(0, —1) + (0,0) = (1, —7) + (0, -6) + (0,0) 
=(1+0+0,-7 -6 + 0) = (1,-13); 
(1,1, 2) - (1,1,2) = (1,1,2) + (C1, -1, -2) 
=(1-1,1-1,2- 2) = (0,0,0). 
Definition 1.1.5 (Transpose of a vector). [Standard notations: U' = transpose of U, 
UT = transpose of U.] If U is a row n-vector then U' is the same written as a column 


and vice versa. 


Some numerical illustrations are the following, where “=” means “implies”: 


-3 
U=| 0 | = U'-[-3,0,1] 
1 
1 
V=(1,5,-1] > V'2| 5 | -VT. 
ES 


Note that in the above illustration U + V is not defined but U + V' is defined. Similarly 
U' + V is defined but U’ + V’ is not defined. Also observe that if z is a 1-vector (a scalar 
quantity) then z’ =z, that is, the transpose is itself. 
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In Example 1.1.6 the position vector is (x, y) = (4, 6). Then the distance of this po- 
sition from the starting point is obtained from Pythagoras' rule as, 


vo +y? = V42 4+ 62 = V52. 


This then is the straight distance from the starting point (0,0) to the final position 
(4,6). We will formally define the length of a vector as follows, the idea will be clearer 
when we consider the geometry of vectors later on: 


Definition 1.1.6 (Length of a vector). Let U be a real n-vector (either a column vector 
or a row vector). If the elements of U are u,,...,u, then the length of U, denoted by 


IU ||, is defined as 
IUI = Nuj +-+ + us, (1.1.6) 


when the elements are real numbers. When the elements are not real then the length 
will be redefined later on. Some numerical illustrations are the following: 


1 
U=|-1| = IUl- NO? + C1? + (0)? = v2; 


V=(1,1,-2) = [IVI] = y(t)? + (1)? + C2? = V6; 


0 
0-|0| = IJ0l-0; e,=(1,0,0,0) > lel - 1 
0 
1 1 
z=(—,-—) > ||ZI|=1. 
SU |Z | 


Note that the “length”, by definition, is a non-negative quantity. It is either zero or a 
positive quantity and it cannot be negative. For a null vector the length is zero. The 
length of a vector is zero iff (if and only if) the vector is a null vector. 


Definition 1.1.7 (A unit vector). A vector whose length is unity is called a unit vector. 


Some numerical illustrations are the following: 
e, = (0,0,0,1) = lel - 1. 
But U =(1,-2,1) 2 |U|| = V6, U is not a unit vector whereas 
1 1 


1 2 1 
= U > (1, 2, 1) = ( > > 
|Ull — v6 v6 v6 v6 
that is, V is a unit vector. Observe the following: A null vector is not a unit vector. If 
thelength of any vector is non-zero (the only vector with length zero is the null vector) 
then taking a scalar multiple, where the scalar is the reciprocal of the length, a unit 


V 


) > wie 
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vector can be created out of the given non-null vector. In general, if U = (u,,...,u,), 
where u}, ..., u, are real, then 


JUN =U" = vid ei 


and 
1 
V = —U 2 |V|/=1 (1.1.7) 
IUII 
when [U || # 0. 
From the definition of length itself the following properties are obvious. If U and 
V are n-vectors of the same type and if a, b, c are scalars, then 
cU] = |e] Ul — |IcU + cV] = |e] U + VII 
IU + VI s U1 + IVl; 
llaU + bV|| < la| |U] + Ibi IIV (1.1.8) 
where, for example, |c| means the absolute value of c, that is, the magnitude of c, 
ignoring the sign. For example, 


|-20.-1,0| = 1-21 yy? + C3? + 1? = 2v3; 


|24, -1, 1)|| = 2V3; 
1 2 
U=|-1|, V2|2| 2BU-«V-|1 |; 
1 -3 -2 


IUI = yar+ e+ - và 


IU + Vl = VQ + (1)2+ (-2)2 = V9 =3 < |U] + IVI = V3 + VIA. 


Now, we will look at another concept. In Example 1.1.2 the family’s total expense of the 
week on those food items is available by multiplying the quantities with unit prices 
and then adding up. That is, if the quantity vector is denoted by Q and the per unit 
price vector is denoted by P then 


Q = (10,15, 20, 10, 5) 
and 
P = (2.00, 1.50, 0.50, 1.00, 3.45). 


Thus the total expense of that family for that week on these 5 items is obtained by 
multiplying and adding the corresponding elements in P and Q. That is, 


(10)(2.00) + (15)(1.50) + (20) (0.50) + (10)(1.00) + (5)(3.45) = $79.75. 


Itis a scalar quantity (1-vector) and not a 5-vector, even though the vectors Q and P are 
5-vectors. For computing quantities such as the one above we define a concept called 
the dot product or the inner product between two vectors. 
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Definition 1.1.8 (Dot product or inner product). Let U and V be two real n-vectors (ei- 
ther both row vectors or both column vectors or one row vector and the other column 
vector). Then the dot product between U and V, denoted by U.V is defined as 


U.V =UV +++ + UnYn 


that is, the corresponding elements are multiplied and added, where u4, ...,u,, and 
V, ..., V, are the elements (real) in U and V respectively. (Vectors in the complex field 
will be considered in a later chapter.) 


Some numerical illustrations are the following: In the above example, the family's 
consumption for the week is Q.P = P.Q = 79.75. 


0 1 
U: = 1 , U» = -1 > 
2 1 


U,.U, = (0)(1) + (1)(-1) + (2)(1) = 1. 
V, =(3,1,-1,5), V,=(-1,0,0,1) 5 
V,.V; = (3)(-1) + (1)(0) + (-1)(0) + 5)(1) = 2. 


From the definition itself the following properties are evident: 
U.0=0, aU.V - (aU).V = U.(aV) 
where a is a scalar. 
U.V=V.U, (aU).(bV)-ab(U.V) 
where a and b are scalars. 
U.(V + W) 2 U.V - U.W - (W + V).U. (1.1.9) 


The notation with a dot, U.V, is an awkward one. But unfortunately this is a widely 
used notation. A proper notation in terms of transposes and matrix multiplication will 
be introduced later. Also, further properties of dot products will be considered later, 
after looking at the geometry of vectors as ordered sets. 


Exercises 1.1 


1.1.1. Are the following defined? Whichever is defined compute the answers. 


0 > 1 2 
(a) -1| + pi (b) 0/-3|0]|; 
1 1 0 


(c) (3, -1, 4) = (2, 1); (d) 5(1, 0) E 3(-2, -1). 
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1.1.2. Compute the lengths of the following vectors. Normalize the vectors (create a 
vector with unit length from the given vector) if possible: 


(a) (0, 0,0); (b) (1,1, -1) 


2 5 1 
(0) |-1]; (d) 0 |; (e) 3|-1 
1 E 1 


1.1.3. Convert the stock market performance vectors in Example 1.1.1 to the following: 
First week's performance into pound sterling (1$ = 0.5 pounds sterling); the second 
week's performance into Italian lira (1$ = 2000 lira). 


1.1.4. In Example 1.1.3 compute the expected value of the random variable. [The ex- 
pected value of a discrete random variable is denoted as E(x) and defined as E(x) = 
Xp, + +++ +X„Pn If x takes the values x4, ..., x, with probabilities p,, ... , p, respectively. | 
If itis a game of chance where the person wins $0, $1, $(—1) (loses a dollar) with prob- 
abilities 1, i i respectively how much money can the person expect to win in a given 
trial of the game? 

1.1.5. In Example 1.1.3 if the expected value is denoted by u = X.P (y the Greek letter 
mu), where X = (x, ...,x,) and P = (p4, ..., p,) then the variance of the random vari- 
able is defined as the dot product between ((x, — 1^, ... , (x, —)*) and P. Compute the 
variance of the random variable in Example 1.1.3. [Variance is the square of a measure 
of scatter or spread in the random variable.] 


1.1.6. In Example 1.1.5 compute the sum of squares of the errors [Hint: If e is the error 
vector then the sum of squares of the errors is available by taking the dot product e.e.] 


1.1.7. In Example 1.1.8 suppose that for each course the distribution of the final grade 
is the following: 20 points each for each test, 10 points for assignments and 50 points 
for the final exam. Compute the vector of final grades of the student for the 5 courses 
by using the various vectors and using scalar multiplications and sums. 


1.1.8. From the chance vector in Example 1.1.10 compute the chance of ever winning 
(sum of the elements) and the expected number of trials for the first win, E(x) (note 
that x takes the values 1,2, ... with the corresponding probabilities). 


1.1.9. Consider an n-vector of unities denoted by J = (1,1,..., 1). If X = (,...,x,) is any 
n-vector then compute (a) XJ; (b) IX J. 


1.1.10. For the quantities in Exercise 1.1.9 establish the following: 
2 3 1 1 
(a) (X - ji) J =0 where ji=( “XJ... XJ). 


[This holds whatever be the values of x,,...,x,. Verify by taking some numerical val- 
ues. ] 
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- , 1,,V 
b  (X-mX-p-xx- n( -XJ) 
1 
-XX- 40) 
whatever be the values of x4, ... , Xn. 
(c) Show that the statement in (a) above is equivalent to the statement Y? ,(x; - X) = 


0 where x = Y? = with Y denoting a sum. 
(d) Show that the statement in (b) is equivalent to the statements 


0; -x*-Yxbp-ne 


that is, i is replaced by 1,2, ..., n and the elements are added up. 


n 
42444 chán 
i=1 


n 
X aib; 2a b +: cab, =a.b 


i=1 


where a = (a;,...,a,) and b = (b, ..., bp). 


n n 
$ Ga;) = 5a, +- +54 =5(a, + +a) =5) a; 
i=1 i=1 

n 


YY ab 5 Ya È hj 


i=1 j=l i=1 


= Y aibi +- + by) = (ay + + + y)(Dy + ++ ba) 
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where 


whatever be x;, ... , Xs. 


1.1.11. When searching for maxima/minima of a scalar function f of many real scalar 
variables the critical points (the points where one may find a maximum or a minimum 
or a saddle point) are available by operating with ts equating to a null vector and 


then solving the resulting equations. For the function 
f 05,35) = 331 +x - 2x, X5 * 5 
evaluate the following: (a) the operator Em (b) 2i (c) we = O, (d) the critical points. 


1.1.12. For the following vectors U,V,W compute the dot products U.V, U.W, V.W 
where 


U=(11,1), V=(1,-2,1), W-(10,-1). 


1.1.13. If V,,V>, V; are n vectors, either n x 1 column vectors or 1 x n row vectors and 
if ||V;|| denotes the length of the vector V; then show that the following results hold in 
general: 

(i) IV; -Vj| > O and |V; - Vj] = 0 iff V; = V5; 

(ii) |cVl = |c| | Vl] where c is a scalar; 

(iii) IV; - Voll + V5 - V3ll = IV; - Vall. 


1.1.14. Verify (i), (ii), (iii) of Exercise 1.1.13 for 


V =(1,0,-1), V,2(0,00,2, V3 =(2,1,-1). 


1.115. Let U = (1,-1,1,-1). Construct three non-null vectors V, V}, V such that 
U.V; = 0, U.V, = 0, U.V3 20, Vj.V; = 0, V,.V3 =0, V>.V3 = 0. 
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1.2 Geometry of vectors 


From the position vector in Example 1.1.6 it is evident that (x, y) = (4, 6) can be denoted 
as a point in a 2-space (plane) with a rectangular coordinate system. In general, since 
an n-vector of real numbers is an ordered set of real numbers it can be represented as 
a point in a Euclidean n-space. 


1.2.1 Geometry of scalar multiplication 


If the position (4,6), which could also be written as (},) = (4), is marked in a 2-space 
then we have the following Figure 1.2.1. One can also think of this as an arrowhead 
starting at (0,0) and going to (4, 6). In this representation the vector has a length and 
a direction. In general, if U is an arrowhead from the origin (0, 0, ..., 0) inn-space to the 
point U = (u,,...,u,,) then -U will represent an arrowhead with the same length but 
going in the opposite direction. Then cU will be an arrowhead in the same direction 
with length c||U|| if c > O and in the opposite direction with length |c| ||U|| if c < 0, where 
|c| denotes the absolute value or the magnitude of c, and it is the origin itself if c = O. 
In physics, chemistry and engineering areas it is customary to denote a vector with an 
arrow on top such as U, meaning the vector U. 


Figure 1.2.1: Geometry of vectors. 


1.2.2 Geometry of addition of vectors 


Scalar multiplication is interpreted geometrically as above. Then, what will be the 
geometrical interpretation for a sum of two vectors? For simplicity, let us consider a 
2-space. If U = (2) and V = (31) then algebraically 


ES ES u,+ Vv 
U+v=( 1 ! 
Un + V2 
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which is the arrowhead representing the diagonal of the parallelogram as shown in 
Figure 1.2.2. From the geometry of vectors one can notice that a vector, as an ordered set 
of real numbers, possesses two properties basically, namely, a length and a direction. 
Hence we can give a coordinate-free definition as an arrowhead with a length and a 
direction. 


Ü*V = (u vsus vj) 


U= AG 


Figure 1.2.2: Sum of two vectors. 


1.2.3 A coordinate-free definition of vectors 


Definition 1.2.1 (A coordinate-free definition for a vector). It is defined as an arrow- 
head with a given length and a given direction. 


pol p ses 
-V 
fei 
> E itd 
A gue => V 


Figure 1.2.3: Coordinate-free definition of vectors. 


In this definition, observe that all arrowheads with the same length and same di- 
rection are taken to be one and the same vector as shown in Figure 1.2.3. We can move 
an arrowhead parallel to itself. All such arrowheads obtained by such displacements 
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aretaken as one and the same vector. If one has a coordinate system then move the vec- 
tor parallel to itself so that the tail-end (the other end to the arrow tip) coincides with 
the origin of the coordinate system. Thus the position vectors are also included in this 
general definition. In a coordinate-free definition one can construct U+VandUu-V 
as follows: Move U or V parallel to itself until the tail-ends coincide. Complete the par- 
allelogram. The leading diagonal gives U + V and the diagonal going from the head of 
U to the head of V gives V — U and the one the other way around is -(V - U) = Ü - V. 


1.2.4 Geometry of dot products 
Consider a Euclidean 2-space and represent the vectors U = (u, u») and V = (v, v;) as 


points in a rectangular coordinate system. Let the angles, the vectors U and V make 
with the x-axis be denoted by 0, and 0, respectively. Let 


0=0, -0 
Then 
u v 
cos 0, = 1.—, cos0,- -— 
322 2,2 
u? +u? NT n 
u v 
sin 0; = 2. sinb, = —— 
u? +u? E +v 
But 


cos 6 = cos(0, — 05) = cos 0; cos 0, + sin 0; sin 0, 


e os UY (1.24) 
NT. n ui 4s v5 JUV 
whenever ||U|| # 0 and ||| + 0. Thus 
Ü.Ÿ = ÜI IPlcosð, IÜ #0, I?I  O. (1.2.2) 


The dot product is the product of the lengths multiplied by the cosine of the angle 
between the vectors. This result remains the same whatever be the space. That is, it 
holds in 2-space, 3-space, 4-space and so on. The Figure 1.2.4 shows the situation when 
0x0, < 7/2, 0x 0, < 7/2, 0, > 0,. The student may verify the result for all possible 
cases of 0, and 0,, as an exercise. From (1.2.1) we can obtain an interesting result. 
Since cos0, in absolute value, is less than or equal to 1 we have a result known as 
Cauchy-Schwartz inequality: 


«1 > [U.V| « LU] IV ]l. 
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Figure 1.2.4: Geometry of the dot product. 


1.2.5 Cauchy-Schwartz inequality 


IU.V| < pU] LV]. 


In other words, if U = (w,...,u,) and V = (y,...,v,) then for real u,,...,u, and 
Veo Vp 


uv, t + UpYnl < yug + e] eui vi eV). (1.2.3) 


When the angle 0 between the vectors U and V is zero or 2n7, n = 0,1, ... then cos0 - 1 
which means that the two vectors are scalar multiples of each other. Thus we have an 
interesting result: 


(i) When equality in the Cauchy-Schwartz inequality holds the two vectors are 
scalar multiples of each other, that is, Ü = cV where c is a scalar quantity. 


When 0 = 77/2 then cos0 = 0 which means U.V = 0. When the angle between the vec- 
tors U and V is 7/2, we may say that the vectors are orthogonal to each other, then the 
dot product is zero. Orthogonality will be taken up later. 


Example 1.2.1. A girl is standing in a park and looking at a bird sitting on a tree. 
Taking one corner of the park as the origin and the rectangular border roads as the 
(x, y)-axes the positions of the girl and the tree are (1,2) and (10,15) respectively, all 
measurements in feet. The girl is 5 feet tall to her eye level and the bird's position 
from the ground is 20 feet up. Compute the following items: (a) The vector from the 
girl's eyes to the bird and its length; (b) The vector from the foot of the tree to the girl's 
feet and its length; (c) When the girl is looking at the bird the angle this path makes 
with the horizontal direction; (d) The angle this path makes with the vertical direction. 


Solution 1.2.1. The positions of the girl's eyes and the bird are respectively U = (1,2,5) 
and V = (10,15,20). 
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(a) The vector from the girl’s eyes to the bird is then 
V - Ü = (10 -1,15 - 2,20 - 5) = (9,13, 15) 
and its length is then 


IV — ÜI = \@)? + (13)? + (15)? = V475. 


(b) The foot of the tree is V, = (10,15,0) and the position of the girl's feet is Ü, = 
(1,2,0). The vector from the foot of the tree to the girl's feet is then 


Ü,- V, = (,2,0) - (10,15,0) = (-9, -13, 0) 


and its length is 


ID; - Hy = NC9? + C13? + (0)? = v250. 
(c) From the girl’s eyes the vector in the horizontal direction to the tree is 
V,—U,- (10, 15,5) - (1,2,5) = (10 - 1,15 - 2,5 — 5) = (9, 13, 0) 
and its length is 
IV, - UL = NGY + 03)? + (0? = V250. 
Let 0 be the angle between the vectors V — U and V, - U,. Then 


(9 - H.W, - 05) 

IV - tV, - 0; 
_ (9,13,15).(9, 13,0) _ ¥250 _ NBS 
| XA754250 v45  V19 


Then the angle 0 is given by 
4, [10 


0 = cos —. 


(d) The angle in the vertical direction is 


G22 - cos”! (S 
2 19 


1.2 Geometry of vectors —— 19 


1.2.6 Orthogonal and orthonormal vectors 


Definition 1.2.2 (Orthogonal vectors). Two real vectors U and V are said to be orthog- 
onal to each other if the angle between them is 5 = 90° or equivalently, if cos @ = 0 or 
equivalently, if U.V = 0. 


It follows, trivially, that every vector is orthogonal to a null vector since the dot 
product is zero. 


Definition 1.2.3 (Orthonormal system of vectors). A system of real vectors U,,..., Ü, 
is said to be an orthonormal system if ÜÜ, =0 for all i and j, i + j (all different vectors 
are orthogonal to each other or they form an orthogonal system) and in addition, || Ü, || = 
1, j =1,2,...,k (all vectors have unit length). 
As an illustrative example, consider the vectors 
Ü, = (1, 1, 1), Ü, T (1, 0, -1), Ü, = (1, -2, 1). 

Then 

U,.Ü, = (1)(1) + (1)(0) + (1)(-1) = 0; 

U,.0, = (1)(1) + (072) + ((1) = 0; 

ÜÜ; = (1)(1) + (0)(-2) + (-1)(1) = 0. 


Thus U,,U,,0 form an orthogonal system. Let us normalize the vectors in order to 
create an orthonormal system. Let us compute the lengths 


IÜ = Var +@2+@2= V3, qUu- VÀ IU, = v6. 


Consider the vectors 
E 1 = 1 1 1 1 
Ve Ü, = ns (— 5) 
"oaa ^ v3 3' V3’ 48 
N lA 1 
jon 9,-( (1,0,-1)) 
IU» v2 
3 Das 1 
= 9,-( 0.2.1). 
IU; v6 
Then V, V, V, form an orthonormal system. 
As another example, consider the vectors, 
e, =(1,0,...,0), e,=(0,1,0,...,0), .., e4-(0,...,0,1). 


Then evidently 


e.e,=0, i£j llel=1, ij=1....n. 


Hence e,, ...,e, is an orthonormal system. 
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Definition 1.2.4 (Basic unit vectors). The above vectors e, ...,e, are called the basic 
unit vectors in n-space. [One could have written them as column vectors as well.] 


Engineers often use the notation 


- 3 + {1 + {0 
i=(1,0), j=(0,1) or i= e j= ( ) (1.2.4) 
to denote the basic unit vectors in 2-space and 


7=(1,0,0), j=(0,1,0), k=(0,0,1) or 


1 0 0 
, k=|0 (1.2.5) 
1 


to denote the basic unit vectors in 3-space. One interesting property is the following: 


(ii) Any n-vector can be written as a linear combination of the basic unit vectors 
Cy, --9 One 


For example, consider a general 2-vector U = (a, b). Then 
ai + bj =a(1,0) + b(0,1) = (a,0) + (0,b) = (a,b) =Ü. (1.2.6) 
IfV= (a, b,c) is a general 3-vector then 


ai + bj + ck = a(1, 0,0) + b(0, 1,0) + c(0, 0,1) 
= (a, 0,0) + (0, 5,0) + (0,0,c) = (a,b,c) = V. (1.2.7) 


Note that the same notation 7 and j are used for the unit vectors in 2-space as well as 
in 3-space. There is no room for confusion since we will not be mixing 2-vectors and 
3-vectors at any stage when these are used. In general, we can state a general result. 
Let U be an n-vector with the elements (u,,...,U,) then 


Ü = uje; + use... (1.2.8) 


[Either all row vectors or all column vectors.] 

The geometry of the above result can be illustrated as follows: We take a 2-space 
for convenience. 

The vector i is in the horizontal direction with unit length. Then ai will be of length 
|a| and in the same direction if a > 0 and in the opposite direction if a « 0. Similarly j 
is a unit vector in the vertical direction and bj is of length |b| and in the same direction 


1.2 Geometry of vectors —- 21 


^ 
bj S(0,5) 85 us (a,b) = ai 4 bj 
j= (01 Dj 
j = (0,1) x iran J 
x X : 
-3i i = (1,0) ai= (a,0) 
-2j Figure 1.2.5: Geometry of linear combinations. 


if b > O and in the opposite direction if b < O as shown in Figure 1.2.5. Then the point 
(a, b), as an arrowhead, is ai + bj. If the angle the vector 


Ü - (a,b) - ai + bj 
makes with the x-axis is 0 then 


_ (ai + bj)(a) _ (a)(a) + (b)(0) 
lai + bjl| lai] Va? + b? Va? 
a 


- "m 5 (1.2.9) 


and 


sin@ = V1 - cos? 0 = a (1.2.10) 


Observe that (1.2.9) and (1.2.10) are consistent with the notions in ordinary trigono- 
metrical calculations as well. 


1.2.7 Projections 


If U = (a,b) then the projection of U in the horizontal direction is 
a= Va? + b?cos0 = ||Ü| cos 8 


which is the shadow on the x-axis if light beams come parallel to the y-axis and hit 
the vector (arrowhead), and the projection in the vertical direction is 


b= Na? + D?sin0 = ||U|| sin@ 


which is the shadow on the y-axis if light beams come parallel to the x-axis and hit 
the vector. These results hold in n-space also. Consider a plane on which the vector 
V in n-space lies. Consider a horizontal and a vertical direction in this plane with the 
tail-end of the vector at the origin and let 0 be the angle V makes with the horizontal 
direction. Then 


IV] cos @ = projection of V in the horizontal direction (1.2.11) 
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and 
| VI sin @ = projection of V in the vertical direction. (1.2.12) 


In practical terms one can explain the horizontal and vertical components of a 
vector as follows: Suppose that a particle is sitting at the position (0,0). A wind with 
a speed of 5 cos 45° = = units is blowing in the horizontal direction and a wind witha 
speed of 5sin 45° = is units is blowing in the vertical direction. Then the particle will 


move at 45° angle to the x-axis and move at a speed of 5 units. 


ai 
orthogonal 


Figure 1.2.6: Movement of a particle. 


Consider two arbitrary vectors U and V (coordinate-free definitions). What is the 
projection of V in the direction of Ü? We can move V parallel to itself so that the tail- 
end of V coincides with the tail-end of U. Consider the plane where these two vectors 
lie and let 0 be the angle this displaced V makes with U. Then the projection of V onto 
Ü is I VI cos @ as shown in Figure 1.2.6 (b). But 


> > 


0 = m => 
IU VI 
IV] cos6 = 7 = projection of V onto U. (1.2.13) 


If U is a unit vector then ||Ül| = 1 and then the projection of V in the direction of U 
is the dot product between U and V. 


Definition 1.2.5 (Projection vector of V in the direction of a unit vector U). A vector 
in the direction of Ü with a length equal to ||V|| cos 0, the projection of V onto JU, is 
called the projection vector of V in the direction of U. 


Then the projection vector V in the direction of U is given by 


> 


(U.V)U if U is a unit vector 


and 


if U is any non-null vector. (1.2.14) 
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Example 1.2.2. Evaluate the projection vector V in the direction of U if 


(a) V-24j-R, ü= db; 
(b) V=i-j+k, U =2i+j+k; 
(c) V=itj+k, Ü-i-K 


Solution 1.2.2. (a) Here U is a unit vector and hence the required vector is 


0.10 - |21, »( d nb d e " j E 


AB AB BAN WS x3 
2 Ie K\ 2a} > 
= (et Jp tg) $609. 


(b) Here U is not a unit vector. Let us create a unit vector in the direction of U, 
namely 


25a Se co Sassen ae ra 
= —— = — (21 +j+ k). 
j^ ve j+k) 


Now apply the formula on V and Us The required vector is the following: 


6:898, - [u-d ye ylle sel ue) 
ER. 22 12 15 
i AEAN vE” z“) 
= zai «j «b. 


(c) Here V.U = (1,1, 1).(1,0, -1) = 0. Hence the projection vector is the null vector. 


Definition 1.2.6 (Velocity vector). In the language of engineers and physicists, the ve- 
locity is a vector with a certain direction and magnitude (length of the vector) and 
speed is the magnitude of the velocity vector. 


For example, if v= (a, b) is the velocity vector as in Figure 1.2.6 then the direction 
of the vector is shown by the arrowhead there and the speed in this case is Va? + b? = 
| VII. If the velocity vector of a wind is V- 214] 4 kina 3-space then its speed is || Vl = 


VQ? « Q? « ci» = v6. 


Example 1.2.3. A plane is flying straight East horizontally at a speed of 200 km/hour 
and another plane is flying horizontally North-East at a speed of 600 km/hour. Draw 
the velocity vectors for both the planes. 
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«Y 


Fr Figure 1.2.7: Velocity vectors. 


Solution 1.2.3. If the velocity vectors for the two planes are denoted by U and V re- 
spectively, as shown in Figure 1.2.7 then the given information is that 


|U| 2200 and |V|- 600. 


If the direction of Ü is taken as the x-axis on the plane where the two vectors lie (dis- 
placed if necessary so that the tail-ends meet at (0,0)) then on this plane 


vi 


U=2001 and V -(600cos45)i + (600 sin 45°) 
ye HOD 
2 NO 


Example 1.2.4. A sail boat is steered to move straight East. There is a wind with a 
velocity in the North-East direction and with a speed of 50 km/hour. What is the speed 
ofthe boat if (a) the only force acting on the boat is the wind, (b) in addition to the wind 
the sail boat has a motor which is set for a speed of 20 km/hour. 


Solution 1.2.4. (a) The only component here is the component of the wind velocity 
vector in the direction of the boat which is ||V||cos@ if V is the velocity vector and 
0 is the angle V makes with the East direction (East direction is taken as the x-axis 
direction). We are given || V || = 50 and 0 = 45°. Then the speed of the boat is || V|cos0 = 
2 and the velocity is U = ai. 

(b) In this case the above component plus the speed set by the engine are there. 


Then the combined speed is = + 20 and the velocity vector is 


V2 
xs “REG : 
B - (7 «20. 
y2 


1.2.8 Work done 


When a force of magnitude F is applied on an object and the object is moved in the 
same direction of the force for a distance d then we say that the work done is Fd (F mul- 
tiplied by d). For example if the force vector has the magnitude 20 units and the dis- 
tance moved in the same direction of the force is 10 units then the work done is 200 
units (force, distance and work are measured in different units such as force in new- 
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||F]| sine 


e 
||F || coso% Figure 1.2.8: Work done. 


tons, distance in kilometers and work in joules). Suppose that the force vector is in 
a certain direction and the distance moved is in another direction then what will be 
the work done? Let F be the force vector and d the displacement vector as shown in 
Figure 1.2.8. 

Let the force vector P make an angle 0 with the displacement vector d. Then the 
projection of F in the direction of dis \|F'|| cos [and the projection vector is IÉl(cos0)U 
where Ü is a unit vector in the direction of d. The component vector of F in the per- 
pendicular direction to dis 


IFIsin8 = F - |Fl(cos6)U. 
This is not required in our computations]. Then the work done, denoted by w, is 


w = ||Fl| cos ld 
. (Gd 
= yey 


^ lld| = F.d. (1.2.15) 
IF I lal 


Example 1.2.5. The ground force F=5i+ 2j of a wind moved a stone in the direction 
ofthe displacement d=i+ 3j. What is the work done by this wind in moving the stone? 


Solution 1.2.5. According to (1.2.15) the work done is 
w = F.d = (5,2).(1,3) = (5)(1) + (2)(6) = 11. 


Example 1.2.6. Consider a triangle ABC with the angles denoted by A, B, C and the 
lengths of the sides opposite to these angles by a, b, c, as shown in Figure 1.2.9. Then 
show that 

a? = b? +c? - 2bccosA. 


Solution 1.2.6. Consider the vectors AB and AC, starting from A and going to B and 
C respectively. 
Then the vector BC = AC — AB. Therefore 


IBCI? = AC - ABI? 
= JAC]? + |AB|? — 2|AC| ABI cos A. 
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B a c Figure 1.2.9: A triangle. 


That is, a? = b? +c? -2bccos A. Here we have used the fact that the square of the length 
is the dot product with itself: 


a? = ||AC - ABI? 
= (AC - AB).(AC - AB) 
= (AC.AC) - (AC).(AB) - (AB.AC) + (AB).(AB) 
= ||AC||? + ||ABI|? — 2(AC.AB) 
= |AC|? + JABI? - 21ACI ABI cos A 


= b? c -2bccosA. 


Exercises 1.2 


1.2.1. Give geometric representation to the following vectors: 


(a) U=21-3), (b) 20, (c) -20, 
(d V-i4J, e) Ù+, 
( U-YV, (g V -20, (h) 2U «3V. 


1.2.2. Compute the angle between the following vectors: 


p 


(0 U=(1,-1,2,3,5,-1), V=(2,0,0,-1,1,2). 


1.2.3. Verify Cauchy-Schwartz inequality for U and V in the three cases in Exer- 
cise 1.2.2. 


1.2.4. Normalize the following vector U, then construct two vectors which are orthog- 
onal among themselves as well as both are orthonormal to U, where U = (1,1, 1,1). 


1.2.5. Given the two vectors Ü, = (1,1,1,1) and Ü, = (1,2, —1,1) construct two vectors 
V, and V, such that V, is the normalized U,, V, is a normalized vector orthogonal to 
V, and both V', and V, are linear functions of Ù} and U,. 


1.2 Geometry of vectors — 27 


1.2.6. Let P = (xy,yo,zg) a fixed point in 3-space, Q = (x,y,z) an arbitrary point in 
3-space. Construct the vector going from P to Q. Derive the equation to the plane where 
the vector PQ lies on the plane as well as another vector N- (a, b, c) is normal to this 
plane (Normal to a plane means orthogonal to every vector lying on the plane). 


1.2.7. If x - y +z - 7 is a plane, (i) is the point (1,1, 1) on this plane? (ii) construct a 
normal to this plane with length 5, (iii) construct a plane parallel to the given plane 
and passing through the point (1,1, 2), (iv) construct a plane orthogonal to the given 
plane and passing through the point (1, -1, 4). 


1.2.8. Derive the equation to the plane passing through the points 
(1, 1,-1), (2, 1,2), (2, 1,0). 


1.2.9. Find the area of the parallelogram formed by the vectors (by completing it as in 
Figure 1.2.2 on the plane determined by the two vectors), 


3 3 


Ü-2i«j-k and V -i-je3k. 
1.2.10. Find the work done by the force F = 21-7 + 3K for the displacement d- 3i+j- k. 


1.2.11. A boat is trying to cross a river at a speed of 20 miles/hour straight across. The 
river flow downstream is 10 miles/hour. Evaluate the eventual direction and speed of 
the boat. 


1.2.12. In Exercise 1.2.11 if the river flow speed is the same what should be the direction 
and speed of the boat so that it can travel straight across the river? 


1.2.13. Evaluate the area of the triangle whose vertices are (1, 0,1), (2,1, 5), (1, —1, 2) by 
using vector method. 


1.2.14. Find the angle between the planes (angle between the normals to the planes) 
X-*y-z-7 and 2x+y-3z=5. 


1.2.15. In some engineering problems of signal processing a concept called convolu- 
tion of two vectors is defined. Let X = (x; ...,x,) and Y = (y,,...,y,) be two row vectors 
of the same order. Then the convolution, denoted by X « Y, is defined as follows: It is 
again a 1 x n vector where the i-th element in X = Y is given by 


Vj + XQVi-g t^ +X 


+ Xi41Yn + Xi2Yn-a + t + XnVis1- 
For example, for n = 2 
X *Y = 08,x5) * (Y1 Y2) = OGY, + X2Y2X1Y2 *X3yi) 


(a) Write down the explicit expression for (x1,x5, X3) * (ys, y y3). 
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(b) Show that the operator « is commutative as well as associative for a general n. 
(c) Evaluate (1,0, —1,2) « (3, 4,5, —2). 
1.2.16. Find the angle between the planes 
X-y-*z-2 and 2x+3y-4z=8. 
1.2.17. Evaluate the area of the triangle whose vertices are (1,1, 1), (2,5, 3), (1, ^1, -1). 


1.2.18. Evaluate the area of the parallelogram determined by the vectors U = (1, -1,2,5) 
and V = (1,1,-1,-1). 


1.3 Linear dependence and linear independence of vectors 


Consider the vectors U, = (1,0, -1) and U; = (1,1,1). For arbitrary scalars a, and a; let 
us try to solve the equation 


aU; + a,U,=0 (1.3.1) 
to see whether there exist nonzero a, and a, such that (1.3.1) is satisfied. 
a,U,+a,U,=0 > 
a,(1, 0, —1) + a,(1,1,1) = O = (0,0,0). 
That is, 
(A, + A), Ay,—-A, + Ay) = (0,0,0) = 
a,+a,=0, a5;-0, -a,+a,=0. 


The only values of a, and a, satisfying the three equations a, + a, = 0, a, = 0 and 
—a, + à; = O are a, = O and a, = O. This means that the only solution for a, and a, 
in (1.3.1) is a, = 0 and a; = 0. Observe that a, = 0, a, = 0 is always a solution to the 
equation (1.3.1). But here we have seen that a, = 0, a; = O is the only solution. Now, let 
us look at another situation. Consider the vectors 


U;-(LL1, U,=(1,-1,2), U3 = (2,0,3). 
Solve the equation 
aU; + aU; + a,U3 = O (1.3.2) 
for a}, a, a3. Then 
aU; +.a,U,+a3,U,;=0 => 
a,(1,1,1) + a,(1,-1,2) + a3(2, 0, 3) = (0, 0,0). 
That is, 


(à + Ay + 205,0, — Ay, A, + 2a, + 33) = (0,0, 0). 
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This means, 


ay F a» ae 204 E 0, (i) 
a, -a,=0, (ii) 
à4 + 2a, + 3a, =0. (iii) 


From (ii), a, = a5; substituting in (i), a, = a; = -a3; substituting in (iii) the equation is 
satisfied. Then there are infinitely many non-zero a4, a), a; for which (1.3.2) is satisfied. 
For example, a, = 1 = a, a, = -1willsatisfy (1.3.2). In the above considerations we have 
two systems of vectors. In one system the only possibility for the coefficient vector is 
the null vector which means that no vector can be written as a linear function of the 
other vectors. In the other case the coefficient vector is not null which means that at 
least one of the vectors there can be written as a linear combination of others. 


Definition 1.3.1 (Linear independence). Let U,,U;...,U, be k given non-null 
n-vectors, where k is finite. Consider the equation 


a,U, + a,U,+---+a,U;,=0 (1.3.3) 


where 4;,...,a, are scalars. If the only possibility for (1.3.3) to hold is when a, = 
0,..., ay = O then the vectors U}, ..., U, are called linearly independent. If there exists 
at least one non-null vector (a4, ...,a,) such that (1.3.3) is satisfied then the system of 
vectors U4, ..., U, are linearly dependent. 


If a non-null vector (a,, ...,a;) exists then at least one of the elements is nonzero. 
Let a, + 0. Then from (1.3.3) 


U =-—U,----- —U,. (1.3.4) 


That is, U} can be written as a linear function of U}, ... , Up. Note that not all a, ...,a, 
can be zeros. If they are all zeros then from (1.3.4) U, is a null vector. But a null vector 
is not included in our definition. Thus at least one of them can be written as a linear 
function of the others if U,, ... , U, are linearly dependent. If they are linearly indepen- 
dent then none can be written as a linear function of the others. 


(i) A null vector is counted among dependent vectors. A set consisting of one non- 
null vector is counted as an independent system of vectors. 


Example 1.3.1. Show that the basic unit vectors e4, ..., e, are linearly independent. 
Solution 1.3.1. Consider the equation 


aye, +---+a,e,=0 > 
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a,(1,0,...,0)+---+a,(0,...,0,1) = (0,...,0) 2 
(4,,...,a4) = (0,...,0) > 


a, =0,...,a, 20 
is the only solution, which means that e}, ..., e, are linearly independent. 


Example 1.3.2. Show that a system of non-null mutually orthogonal vectors are lin- 
early independent. 


Solution 1.3.2. Let V;,..., V, be a system of mutually orthogonal vectors. Consider the 
equation 


a Vi poene + a, V, = Q. 
Take the dot product on both sides with respect to V;. Then we have 
a Vi.V4 oF @ V.V; sete aj V,.Vi — O.V; = O. 


But V;.V, = 0 for j #1 and V,.V; = |Vil? + 0. This means that a, = 0. Similarly a, = 
O,...,@, = 0 which means that Vj, ..., V, are linearly independent. This is a very im- 
portant result. 


(ii) Every set of mutually orthogonal non-null vectors are linearly independent. 
(iii) Any finite collection of vectors containing the null vector is counted as a linearly 
dependent system of vectors. If Sj and S are two finite collections of vectors where S, 
is a subset of S, that is, S, c S, then the following hold: If S, is a linearly dependent 
system then S is also a linearly dependent system. If S is a linearly independent 
system then S, is also a linearly independent system. 


Example 1.3.3. Check the linear dependence of the following sets of vectors: 


(a) U,=(1,2,1), U= (1,1,1); 
(b U,=(1,-1,2), U= (1,1,0); 
(c) U,=(1,2,1), U,=(1,-1,1), U3 = (3,3,3). 


Solution 1.3.3. (a) For two vectors to be dependent one has to be a non-zero scalar 
multiple of the other. Hence U; and U, here are linearly independent. 

(b) By inspection U,.U, = 0 and hence they are orthogonal thereby linearly inde- 
pendent. 

(c) U, and U, are evidently linearly independent, being not multiples of each 
other. By inspection U3 = 2U, + U, and hence the set {U}, U5, U3} is a linearly depen- 
dent system. 
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(iv) Linear dependence or independence in a system of vectors is not altered by 
scalar multiplication of the vectors by non-zero scalars. 


This result can be easily seen from the definition itself. Let the n-vectors U,,...,U, be 
linearly independent. Then 
aU; +: taU, 20 a4-0,...,a, - O. 


Let c,,...,c, be non-zero scalars. If a; = O then ajc; = O and vice versa since c; + 0, 
i=1,2,...,k. Thus 


a,(c,U,) +- +.a;,(c,U,) =O = a, =0,...,a, =0. 


On the other hand, if U,,...,U, are linearly dependent then at least one of them can 
be written as a linear function of the others. Let 


U, bU, +--+ +b, Uy 


where b,...,b, are some constants, at least one of them nonzero. Then for 
cı #0,...,C, 20 


cb cb 
cU, = ~ (QU) qoe zer Ho 
2 


Thus c4U,, ..., c,U, are linearly dependent. 
We have another important result on linear independence. 


(v) Linear independence or dependence in a system of vectors is not altered by 
adding a scalar multiple of any vector in the system to any other vector in the sys- 
tem. 


This result is easy to establish. Let the system U,, ... , U, of n-vectors be linearly inde- 
pendent. Then 


a,U,+---+a,U,=0 > a,=0,...,a, =0. 


Now, consider a new system U,,c U, + U3, ... , U,. [That is, U, is replaced by c U; + U5, 
c + 0. In other words, c U; is added to U,.] Consider the equation 


aU; + a,(c U, + Uy) + a3U4 +--+» + aU, = O. 
That is, 
(a, + ca5)U, + aU; + -- + aU, = O. 


Then since U,, ... , U, are linearly independent a, + ca; = 0, a, = O, ..., ay = O which 
means a, = 0 also which establishes that the system of vectors U,,cU, + U5, U3,..., Uk 
is linearly independent. A similar procedure establishes that if the original system is 
linearly dependent then the new system is also linearly dependent. 
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By combining the results (iii) and (iv) above we can have the following result: 


(vi) Consider a finite collection of n-vectors. If any number of vectors in this collec- 
tion are multiplied by nonzero scalars or a linear function of any number of them is 
added to any member in the set, linear independence or dependence in the system 
is preserved. That is, if the original system is linearly independent then the new 
system is also linearly independent and if the original system is dependent then 
the new system is also linearly dependent. 


Example 1.3.4. Check to see whether the following system of vectors is linearly inde- 
pendent or dependent: 

U; = (1,0, 2,-1,5) 

U, = (-1, 1,1, -1, 2) 

U3 = (2,1,7, —4, 17) 
Solution 1.3.4. Since nonzero scalar multiplication and addition do not alter inde- 


pendence or dependence let us create new systems of vectors. In what follows the 
following standard notations will be used: 


A few standard notations 


“a(i) = " means the i-th vector multiplied by a (1.3.5) 


In this operation the i-th vector in the set is replaced by a (Greek letter alpha) times 
the original i-th vector. For example “—3(1) >” means that “the first vector multiplied 
by -3 gives”, that is, the new first vector is the original first vector multiplied by —3. 


“a(i) + (j) =” means a times the i-th vector added to the j-th vector (1.3.6) 


In this operation the original i-th vector remains the same whereas the new j-th vector 
is the original j-th vector plus a times the original i-th vector. Let us apply these types 
of operations on U}, U>, U3, remembering that linear independence or dependence is 
preserved. 


(1) +(2) = U =(1,0,2,-1,5) 
V, = (0, 1, 3, -2, 7) 
U; = (21,7, 4,17) 


In the above operation the second vector U, is replaced by U, +U; = V}. Let us continue 
the operations. 


-2(1) + 3) = U; = (1,0, 2, —1, 5) 
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V; = (0, 1, 3, -2, 7) 
V; = (0,1,3,-2,7) 


In the set U,, V>, V3 we will do the next operation. 


-(2) + (3) = U; = (1,0, 2, -1,5) 
V, = (0,1, 3, -2, 7) 
W; = (0,0, 0, 0, 0) 


Here W; is obtained by adding (—1) times V, to V; or replacing V; by V3 - V, = W}. By 
the above sequence of operations W; has become a null vector which by definition is 
dependent. Hence the original system Uj, U}, U; is a linearly dependent system. 


Example 1.3.5. Check the linear dependence or independence of the following sys- 
tem of vectors: 


U; = (2,-1,1,1,3, 4) 
U, = (5,2,1,-1,2, 1) 
U3 = (1,-1,1,1,1,4) 


Solution 1.3.5. Since linear dependence or independence is not altered by the order 
in which the vectors are selected we will write U3 first and write only the elements in 
3 rows and 6 columns as follows, rather than naming them as U3, U}, U3: 


1-11 1 1 4 
2.-11 1 3 4 
5 2:11 2 1 


We have written them in the order U}, U}, U, to bring a convenient number, namely 
1, at the first row first column position. This does not alter linear independence or 
dependence in the system. Now, we will carry out more than one operations at a time. 
[We add (—2) times the first row to the second row and (—5) times the first row to the 
third row. The first row remains the same. The result is the following:] 


1-1 1 1 1 4 
-2(1) + (2);-5(1)+ (B3) > 0 1 -1 -1 1 —4 
0 7 -4 -6 -3 -19 


[On the new configuration we add the second row to the first row and (-7) times the 
second row to the third row. The second row remains the same. The net result is the 
following:] 


1000 2 0 


(2)+(1);-7(2)+(3) > O 1 -1-1 1 -4 
00 3 1 -10 9 
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[The third row is divided by 3. The third row changes. ] 


1 10 0 0 2 0 
30) 201 -1 -1 1 -4 
1 
001 4 -X 
[This operation is done to bring a convenient number at the third column position on 
the third row. Now we add the new third row to the second row. The new third row 
remains the same.] 


100 0 2 0 
2. 
(3)+Q 30 1 0 -$ - 


7 

3 

1 10 
001 4 -Ë s 


The aim in the above sequences of operations is to bring a unity at all leading diagonal 
(the diagonal from the upperleft-end corner down) positions, if possible. Interchanges 
of rows can be done if necessary to achieve the above aim, because interchanges do 
not alter the linear independence or dependence. During such a process if any row be- 
comes null then automatically the original system, represented by the starting rows, 
is dependent. If no row becomes null during the process then at the end of the pro- 
cess look at the final first, second, etc columns. In our example above look at the first 
column. No non-zero linear combination of the second and third rows can create a 1 
at the first position. Hence the first row cannot be written as a linear function of the 
second and third rows. Now look at the second column. By the same argument above 
the second row cannot be written as a linear function of the first and third rows. Now 
look at the third column. By the same argument the third row cannot be written as a 
linear combination of the first and second rows. Hence all the three rows are linearly 
independent or the original system {U}, U, U3} is a linearly independent system. 

The above procedure is called a sweep-out procedure. Then the principles to re- 
member in a sweep-out procedure are the following: Assume that the system consists 
of m vectors, each is an n-vector. 


Principles in a sweep-out procedure 


(1) Write the given vectors as rows, interchange if necessary to bring a convenient 
nonzero number, 1 if available, at the first row first column position. Do not inter- 
change columns, the vectors will be altered. 

(2) Add suitable multiples of the first row to the second, third, ..., m-th row to make the 
first column elements, except the first element, zeros. 

(3) Start with the second row. Interchange 2nd, ..., m-th rows if necessary to bring a 
convenient nonzero number at the second position on the second row. 

(4) Add suitable multiples of the second row to the first row, third row, ..., m-th row to 
make all elements in the second column, except the second element, zeros. 
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(5) Repeat the process with the third, fourth etc rows until all the leading diagonal ele- 
ments are non-zeros, unities if possible. 

(6) During the process if any row becomes null then shift it to the bottom position. If 
at any stage a vector has become null then the system is dependent. If all the lead- 
ing diagonal elements are non-zeros when all other elements in the corresponding 
columns are wiped out (made zeros) by the above process then the system is linearly 
independent. 

(7) Ifthe first r, for some r, leading diagonal elements are non-zeros, none of the rows 
has become null so far and the (r + 1)th elements in all the remaining rows are zeros 
then continue the process with the (r + 2)th element on the (r  1)th row and so on. If 
no row has become null by the end of the whole process then all the rows are linearly 
independent. 

(8) Division of a row by a non-zero scalar usually brings in fractions. Hence multiply 
the rows with appropriate numbers to avoid fractions and to achieve the sweep-out 
process. 


The leading diagonal elements need not be brought to unities to check for linear de- 
pendence or independence. Only nonzero elements are to be brought to the diagonal 
positions, if possible. When doing the operations, try to bring the system to a triangu- 
lar format by reducing all elements below the leading diagonal to zeros, if possible. 
When the system is in a triangular format all elements above nonzero diagonal ele- 
ments can be simply put as zeros because this can always be achieved by operating 
with the last row first, wiping out alllast column elements except the last column last 
row element, then last but one column elements and so on. Thus all elements above 
nonzero diagonal elements can be simply put as zeros once the matrix is in a triangular 
format. 

(9) Ifthe vectors to be checked for linear independence or dependence are column vec- 
tors then write them as rows before executing a sweep-out process. This is done only 
for convenience because operations on rows are easier to visualize. 

(10) When doing a sweep-out process always write first the row that you are operating 
with because this row is not changing and others can change as a result of the op- 
erations. 


Example 1.3.6. Check for linear independence or dependence in the following system 
of vectors: 


U, = (2,0,1,5) 
U, = (1,-1,1,1) 
U3 = (4,2,2,8) 


Solution 1.3.6. For convenience write in the order U,,U,, U and write only the ele- 
ments and continue with the sweep-out process. 
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1 -1 1 1 1 -1 1 1 
2 0 1 5 -21)+(2; -41)+8)>0 2 -1 3 
4 2 2 8 0 6 2 4 
22 2 2 20 1 5 
20290 2 -1 3 (+); -3(2)+3) 50 2 -1 3 
0 6 -2 4 00 1 -5 
2 0 0 10 
(3)+(2); -3)+@ >0 2 0 -2 
001 -5 


The leading diagonal elements are 2,2,1 which are non-zeros and hence the system 
is linearly independent. [During the process above the first row is multiplied by 2 in 
order to avoid fractions in the rest of the operations. ] 

Note that in the above operations the row that you are operating with remains 
the same and the other rows, to which constant multiples are added, change. In the 
last form above, are all the four columns linearly independent? Evidently not. The last 
column = 5 (column 1)—(column 2)-5(column 3). 

If our aim is only to check for linear independence or dependence then we need 
to bring the original set to a triangular type format. In the second step above the op- 
eration 2(1) and in the third stage the operation (2) + (1) need not be done. That is, 


1 -1 1 1 1 -1 1 
0. 2 -1 3 -320«(3)20 2 -1 3 
0 6 -2 4 0.0 1 -5 


Now we have the triangular type format with nonzero diagonal elements. Note that the 
first row cannot be written as a linear function of the second and third rows. Similarly 
no row can be written as a linear function of the other two. At this stage if we wish to 
create a diagonal format for the first three columns then by using the third row one 
can wipe out all other elements in the third column, then by using the second row we 
can wipe out all other elements in the second column. In other words, we can simply 
replace all those elements by zeros, then only the last column will change. 


1-11 1 1-10 6 


O 2 -1 350 2 O -2 (operating with the third row) 
0.0 1 —5 0 0 1 -5 
100 5 100 5 
—50 2 0 —2 (operating with the second row) > O 1 0 -1 
0.0 1 -5 0.0 1 —5 


dividing the second row by 2. Thus, the first three columns are made basic unit vectors, 
the same procedure if we wish to create unit vectors in the first r columns and if there 
are r linearly independent rows. 
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At a certain stage, say the rth stage, suppose that all elements in the (r  1)th col- 
umn below the rth row are zeros. Then start with a nonzero element in the remaining 
configuration of the columns in the remaining row and proceed to create a triangular 
format. For example, consider the following situation: 


11-11-11 1 1 1 -1 1 -1 1 1 
O 1 0 2 1 -1 O 1 O 2 1 -1 1 
00002 0 5000 1 0 0 4 
00000 0 1| 00 0 0 2 0 1 
00 0 1 O 0 4 00 0 0 0 0 1 


The first two rows are evidently linearly independent. Our procedure of triangulariza- 
tion cannot proceed. Write the 5th row in the 3rd row position to get the matrix on 
the right above. Now we see that the new 3rd, 4th and 5th rows form a triangular type 
format. This shows that all the five rows are linearly independent. Note that by using 
the last row one can wipe out all other elements in the 7th column. Then by using the 
4th row we can wipe out all other elements in the 5th column. Then by using the 3rd 
row we can wipe out all other elements in the 4th column. Then by using the second 
row we can wipe out all other elements in the second column. Now, one can see lin- 
ear independence of all the five rows clearly. In the light of the above examples and 
discussions we can state the following result: 


(vii) There cannot be more than n mutually orthogonal n-vectors and there cannot 
be more than n linearly independent n-vectors. 


It is not difficult to establish this result. Consider the n-vectors U4, ..., U,, U, 4, that is, 
n+ 1 vectors of n elements each. Write the n + 1 vectors as n+ 1 rows and apply the 
above sweep-out process. If the first n vectors are linearly independent then all the n 
leading diagonal spots have nonzero entries with all elements in the corresponding 
columns zeros. Thus automatically the (n + 1)th row becomes null. Hence the (n+ 1)th 
row depends on the other n rows or the maximum number of linearly independent 
n-vectors possible is n. 

If possible, let V}, ..., V,,,,; be mutually orthogonal n-vectors. From what we proved 
just above not all these n + 1 vectors can be linearly independent. Then the (n + 1)th 
can be written as a linear function of the other n vectors. Then there exists a non-null 
vector b = (b,, ..., b,) such that 


Va = bi Vi +--+ + bp Va 


Take the dot product on both sides with respect to V;. If all V}, ..., V,,, are mutually 
orthogonal then we have 


0O=0+5b,|V;I? +0 = b;=0, i=1,...,n 
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since ||V;|| + 0, i 2 1,...,n. This then contradicts the fact that b is a non-null vector. 
Thus they cannot be all mutually orthogonal. Since the orthogonal vectors are linearly 
independent, proved earlier, the maximum number of n-vectors which are mutually 
orthogonal is n. 


1.3.1 Avector subspace 


The vectors in our discussion so far are ordered n-tuples of real numbers. The notions 
of vector spaces, dimension etc will be introduced for such vectors. Then later we will 
generalize these ideas to cover some general objects called vectors satisfying some 
general postulates. Consider, for example, two given vectors 


U,=(1,0,-1) and U,=(2,3,1). 


Evidently U, and U, are linearly independent. Two vectors being dependent means 
one is a multiple of the other. Consider a collection S, of vectors which are spanned 
by U, and U, by the following process. Every scalar multiple of U, as well as of U, is 
in S|. For example 
3U, = 3(1, 0, -1) = (350, -3) € 5, 
-2U, = -2(2,3,1) = (-4,-6, -1) € S, 
OU, = (0,0,0) € S}. 


Every linear combination of U, and U; is also in S}. For example, 


2U; — 5U; = (2,0, 2) + (-10, -15, —5) = (-8, -15, -7) € S, 
U, + U; = (,0,-1) + (2,3,1) = (3,3, 0) € S, 
U, + OU, - U, €S). 


Since a scalar multiplication and then addition will create a linear combination the 
basic operations are scalar multiplication and addition. Then every element in S,, el- 
ements are vectors, can be written as a linear combination of U, and U,. In this case 
we say that S, is spanned or generated or created by U, and U;. Then we say that the 
collection {U}, U5) is a spanning set of S}. 


Definition 1.3.2 (Vector subspace). Let S be a collection of vectors such that if V; € S 
then cV, € S where c is any scalar, including zero, and if V; € S and V, € S then 
V; + V; € S. Then S is called a vector subspace. 


Another way of defining S is that it is a collection which is closed under scalar 
multiplication and addition. When the elements of S are n-vectors (ordered set of n 
real numbers) then the operations “scalar multiplication" and “addition” are easily 
defined and many properties such as commutativity, 
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V,+V,=V,+V;, 
associativity 
Vi sh (V5 + V3) = (Vi + V>) + V3 


and so on are easily established. But if the elements of S are some general objects 
then the operations “scalar multiplication” and “addition” are to be redefined and 
then all types of extra properties are to be double-checked before constructing such a 
collection which is closed under “scalar multiplication” and “addition”. A more gen- 
eral definition of S will be introduced later. For the time being the elements in our S 
are all n-tuples of real numbers. The null vector is automatically an element of any 
such S. Thatis, O € S. If V € S then V +0=V,-V €S, V - V =0. 


Definition 1.3.3 (A spanning set of a vector subspace). A collection of vectors which 
span the whole of a given vector subspace is called a spanning set of that vector sub- 
space. 


Note that there can be a number of spanning sets for a given subspace S. In our il- 
lustrative example C, = {U}, U,}, where U; = (1,0,-1), U, = (2,3,1), spans the subspace 
Sı. The same subspace could be spanned by C, = {U}, U5, U, + Up} or C3 = {U}, U + 3Uj] 
or C, = (U5, U, — U, 2U; + 5U, U,) and so on. Thus, for a given subspace there can be 
infinitely many spanning sets. In all the spanning sets, C,,...,C,, above the smallest 
number of linearly independent vectors which can span S, or the maximum number 
of linearly independent vectors in all those spanning sets is 2. 


Definition 1.3.4 (A basis for a vector subspace). A setofalllinearly independent vec- 
tors in a spanning set of a vector subspace is called a basis for that vector subspace. 
That is, a basis is a spanning set consisting of only linearly independent vectors. 


As there can be many spanning sets for a given vector subspace there can be 
infinitely many bases for a given vector subspace. In our illustrative example B, = 
{U,, U5] is a basis, B, = {U}, U, + 3U;} is another basis, B = {U}, U} - U>} is a third basis, 
but B, = (U5, U, - U5,2U, + U>} is not a basis because one vector, namely 


2U, + U, = (U, — Uy) + 3U;, 


is a linear function of the other two. B, is a spanning set but not a basis. We are im- 
posing two conditions for a basis of a vector subspace. (i) A basis is a spanning set for 
that vector subspace; (ii) A basis consists of only linearly independent vectors. 


Example 1.3.7. Construct 3 bases for the vector subspace spanned by the following 
set of vectors: 


U,=(1,1,1), U,-(L-L2, U; = (2,0,3). 
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Solution 1.3.7. Our first step is to determine the number of linearly independent vec- 
tors in the given set so that one set of the maximum number of linearly independent 
vectors can be collected. Let us apply the sweep-out process, writing the vectors as 
rows. 


1 1 1 1 1 1 
1 -1 2 -1(1)+(2); -204(0)20 -2 1 
2 0 3 0 -2 1 

1 1 1 


-1(2)+3) > 0 -2 1 
0 0 0 


Thus the whole vector subspace S, which is spanned by {U,,U,,U3}, can also be 
spanned by {V,, V,} where 


V,=(41,1), and V,=(0,-2,1). 


Hence one basis for S is B, = {V,,V,}. Any set of 2 linearly independent vectors that 
can be constructed by using V, and V, is also a basis for S. For example, 


B, = QV,3V;, Bz ={V,,V.+ Vj} 


are two more bases for S. Infinitely many such bases can be constructed for the same 
vector subspace S. This means that if we start with V, only then we can span only a 
part of S or a subset of S, say S,. This S, consists of all scalar multiples of V}. Similarly 
if we start with only V, we can only span a part of S or a subset of S, say S. This S, 
consists of scalar multiples of V}. Note that the union of S, and S5, S; US), is not S. All 
linear functions of V; and V, are also in S. Hence S; U S, is only a subset of S. 


Definition 1.3.5 (Dimension of a vector subspace). The maximum number of linearly 
independent vectors in a spanning set of S or the smallest number of linearly indepen- 
dent vectors which can span the whole of S or the number of vectors in a basis of S is 
called the dimension of the subspace S. 


In our illustrative Example 1.3.7 the dimension of S is 2. In general, observe that 
for a given subspace S there cannot be two different bases B, and B, where in B, the 
number of linearly independent vectors is m, whereas in B; that number is m, with 
m, + m,. If possible let m, < m,. Then every vector in S is a linear function of these m, 
vectors and hence by definition there cannot be a vector in S which is linearly inde- 
pendent of these m, vectors. That means m, must be equal to m;. 

One more point is worth observing. Since every 3-vector can be written as a linear 
function of the basic unit vectors, the vectors U, = (1,0, -1) and U; = (2,3,1) in our 
illustrative example can be written as linear functions of the basic unit vectors 


€i = (1, 0, 0), e = (0, 1, 0), €5 = (0, 0, 1). 
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Note that 
Uj-e,-e, and U,=2e,+3e,+e3. 


In the set B = {e}, e2, e3} there are 3 linearly independent vectors. We have already seen 
that U, and U, can be written as linear functions of these unit vectors. Thus this set 
B could have spanned not only S of our Example 1.3.7, call it S, the vector subspace 
spanned by U, and U,, but also a much larger space S where our Š is a subset or S is 
contained in S or Š c S or Š is a subspace there. This is why we used the phrase “sub- 
space" in our definitions. Incidently, since S c S we can also call S itself a subspace. 


Definition 1.3.6 (Orthogonal subspaces). Consider two subspaces, S and S* of 
n-vectors such that for every vector U € S and every vector V e S*, U.V = 0. That 
is, vectors in S are orthogonal to the vectors in S* and vice versa. Then S and S* are 
called subspaces orthogonal to each other. 


Obviously, since the same vector cannot be orthogonal to itself (except the null 
vector) the same non-null vector cannot be present in S as well as in S*. For example, 
if U; = (1,1, 1) is in S then V, = (1, 22,1) and V, = (1,0, -1) are two possible vectors in S* 
since U,.V, = 0 and U,.V, = 0. But V; or V, or both need not be present in S*. 


Example 1.3.8. If U = (,2, -1) € S and if S is spanned by U itself then what is the max- 
imum possible number of linearly independent vectors in a subspace S* orthogonal 
to S? Construct a basis for such an S*. 


Solution 1.3.8. Let X = (x1, X2,X3) bein S*. Then 
U.X Z0 > x 4 2x5 - X4 - O. (1.3.7) 


The maximum number of linearly independent 3-vectors possible is 3. Orthogonal vec- 
tors are linearly independent. Hence the maximum number of linearly independent X 
possible is 3-1 = 2. In order to construct a basis we construct two linearly independent 
X from equation (1.3.7). For example, X, = (-2,1,0) and X, = (-1,1,1) are two linearly 
independent solutions of (1.3.7). Hence (X, X) is a basis for the orthogonal space S*. 
There can be many such bases for S* , each basis will consist of two linearly indepen- 
dent solutions of (1.3.7). Note that the subspace spanned by X, - (-2,1, 0) alone will be 
orthogonal to S as well as the subspace spanned by X, = (-1,1,1) alone will be orthogo- 
nal to S. But we were looking for that orthogonal subspace consisting of the maximum 
number of linearly independent solutions of (1.3.7). 


Definition 1.3.7 (Orthogonal complement of a subspace). Let S be a vector subspace 
and S* a subspace orthogonal to S. If all the maximum possible number of linearly 
independent vectors, orthogonal to S, are in S* then S* is the orthogonal complement 
of S and it is usually written as S* = S+. 
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(viii) If the dimension of a vector subspace S of n-vectors is m « n and if S* is the 
orthogonal complement of S then the dimension of S* is n-m. If the dimension of S 
is n then the dimension of S* is zero which means S* contains only the null vector. 


1.3.2 Gram-Schmidt orthogonalization process 


From a given set U}, ..., Ux of k linearly independent n-vectors can we create another 
set V4, ..., V, of vectors which form an orthonormal system and each V; is a linear 
function of the U;,’s? That is, V;.V; 20, i +j and ||Vjl| 2 1, j=1,...,k. The answer to this 
question is in the affirmative and the process by which we obtain the set Vj, ..., Vk 
from the set U}, ..., Ux is known as the Gram-Schmidt orthogonalization process. This 
process can be described as follows: Take the normalized U, as V,. Construct a V; 
where 
_ _W 
? qw 


W, = U,+aV, 


where a is a scalar quantity. Since we require V, to be orthogonal to W, we have 
W,.V, = 0 or U.V, + aV,.V, = U.Vj + a = O since V,.V, = 1. Then a = -U,.V,. That is, 
W, = U, - (U,.V,)V, where U,.V, is the dot product of U, and V}. Note that 


W3. V; = U,.V4 = (U.V,)V,.V; = U,.V4 = U,.Vi =0 
since V,.V, = |V;|? = 1. Thus V, and V, are orthogonal to each other and each one is a 
normalized vector. Now, consider the general formula 


W, = Uj - (Uj.Vj)V, - (Uj.V3)V 


--(ULVPQ)V;a. forj=2,...,k and (1.3.8) 
Wj 

Vj d. 
IW; 


For example, 


W; =U; - (U4.V)V, - (U3. V2) V3, 
= W; 
EVAN 


Let us see whether W; is orthogonal to both V, and V,. Take the dot product 
W3.V, = U4.V, - (U3.V,)V,.V, - (U3.V5)V,.V4. 
It is already shown that V,.V, = 0 and V,.V, = |V;|? = 1. Hence 


W3.V, = U3.V, - (U3.V,) - 0. 
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Now take, 


W;.V; = U3.V, = (U3.V,)V1.V> = (U3.V>)V>.V> 
= U}.V, - 0 - (U3.V,) =0 
since V;.V, = || V3]? = 1 and V,.V, = 0. 
The formula (1.3.8) is constructed by writing W; as a linear function of U;, Vissi 
V; , and then solving for the coefficients by using the conditions that the dot products 


of W; with V, ..., Via are all zeros. One interesting observation can be made on (1.3.8). 


V; is a linear function of V}, ..., Vii and U; which implies that V; is a linear function 


of U,, ..., U; only. That is, V; is a function of U, only, V, is a function of U, and U, only 
and so on, a triangular format. 


Example 1.3.9. Given the vectors 
U, = (1, 1, -1), U» A (1,2, 1), U3 = (2, 3, 4) 


construct an orthonormal system by using U,, U, and U3, if possible. 


Solution 1.3.9. Let 


iu, ul- Va? +0? +P = V3 2 
[Uy Ih 
V, = la ist 

T 48 >$ . 

Let 
W, = U, - (U.V4)V, 
where 
V,.U, = 1 1-121212) 


v3 
1 
= ND + (00) + C20)) 


W, = U, - (U;.V4)V, 


oe 
E 


e 145 1 
= (L2,1) 8.0 2-(2.2.2) 304,5) 
mai- 3t «es e - MP = 


vV, = LENIN 

Wp v42 

Note that for any vector U and for any nonzero scalar a, ||aU|| = |a| ||U|| and hence keep 
the constants outside when computing the lengths. Consider 


(1, 4, 5). 


W; = U; - (U4.Vj)V, — (U3.V3)V5, 
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where 
V,.U. sagas 
1° 3 48 > > . » > V3 
1 
(V,.U3)V, = 305-70, 
1 34 
V,.U3 = —— (1, 4, 5).(2, 3,4) = —, 
ie Va X v42 
34 
V,.U3)V, = — (1, 4, 5). 
(V3.U3)V5 22 ) 
Therefore 
1 34 1 
W; = (2,3,4 1,1,-1 1,4,5) = —(6,-4,2 
3 = ( ) zi ) rm ) ;( ) 
with 
v56 W. 1 
|W ll = > V. 3 (6, —4, 2). 
E ? IW v56 
Verification 
V.V. -| l1 |, 1 (,4,5)] - 0; 
155272 48 b . J42 "b M 
1 1 
V,.V. -| 1,1, »]. 6, 4,2)] =0; 
1 V3 NE ) TA ) 
1 1 
V,.V3 = l= 1,4,5 Mx 6,—4,2 | =0. 
2V3 Ta ) TA ) 


Thus V, V5, V; is the system of orthonormal vectors available from U4, U5, U3. 
Example 1.3.10. Given the vectors 

U;-(LL-1, U, = (1,2,1), U3 = (2,3,0) 
construct an orthonormal system by using U}, U}, U3, if possible. 


Solution 1.3.10. Since U, and U, are the same as the ones in Example 1.3.9 we have 


as HE 
3 


Now, consider the equation 


1 


V. 
: V42 


(1,1,-1) and V,= (1,4,5). 


W; = U; - (U4.Vj)V, - (U4.V5)V; 


where 
d 
v3 


5 


ys | (1,1, -1)|.12,3, Up 


a 
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5 
(V,.U)V, = 2 0,1,-1) 


(1,4,5)].[2,3.0] £ A 


1 

V42 
14 

V>.U3)V> = — (1,4,5). 

(V2.U3)V> D ) 


V,.U3 = | 


Then 


5 14 
W; = (2,3,0 1,1,-1 1,4,5) = (0,0, 0). 
3 = 2.0 )- 3503) =A ) 


In this case the only orthogonal system possible is with a null vector and the non-null 
vectors V; and V,. Here V, and V, are orthonormal but a null vector is orthogonal but 
not a normal vector. This situation arose because in the original set U}, U5, U3, not all 
vectors are linearly independent. U, could have been written as a linear function of 
U, and U,, in fact U} = U, + U,, and that is why W; became null. 


(ix) If there are m, dependent vectors and m, linearly independent vectors in a given 
system of m, + m, vectors of the same category then when the Gram-Schmidt or- 
thogonalization process is applied on these m, +m, vectors we get only m, orthonor- 
mal vectors and the remaining m, will be null vectors. 


When we start with a given set of vectors Uj,..., U, we do not know whether it is a 
linearly independent or dependent system. Hence, start with the orthogonalization 
process. If a W; becomes null, ignore the corresponding U; and proceed with the re- 
maining to obtain a set of orthonormal vectors. This will be m, in number if in the 
original set U}, ..., Up only m, were linearly independent. 


Note. For a more rigorous definition of a vector space we will wait until after the dis- 
cussion of matrices so that these objects can also be included as elements in such a 
vector space. 


Exercises 1.3 


1.3.1. Check for linear dependence or independence in the following set of vectors: 


1 2 3 1 
-1 0 1 -1 

a) U,=|0], U-|-1|, Uj-| 1|. U,=| 0]; 
1 1 -1 0 
2 5 1 1 


(b U,=(2,0,1,-1), U,=(3,0,-1,2), U3 =(5,0,0, 1); 
(c) U, = (3, 1, -11, 2), U, = (5, 1, 2, -1,0), U; = (7, -1,1, -1, 0). 
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1.3.2. For each case in Exercise 1.3.1 find a basis for the vector subspace spanned by 
the vectors in the set. 


1.3.3. For each of the subspaces spanned by the vectors in Exercise 1.3.1 construct a 
basis for the orthogonal complement and compute the dimensions of each of these 
orthogonal complements. 


1.3.4. For each set of vectors in Exercise 1.3.1 construct a set of (i) mutually orthogonal 
vectors as linear functions of the given set of vectors, (ii) a set of orthonormal system 
of vectors as linear functions of the given set of vectors, if possible. 


1.3.5. Let U, and U, be two linearly independent 2-vectors. Let V be an arbitrary 
2-vector. Show that V can be written as a linear function of U, and U}. 


1.3.6. Illustrate the result in Exercise 1.3.5 geometrically. 


1.3.7. Let U}, U, and U; be three linearly independent 3-vectors and let V be an arbi- 
trary 3-vector. Show that V can be written as a linear function of U4, U, and U3. 


1.3.8. Treating vectors as arrowheads let b = (1,1,-1) = i+j- kand Ü, = (2,1,0) = 2i+j 
give a geometric interpretation of a basis for the subspace orthogonal to the subspace 
spanned by U, and U). 


1.3.9. In the language of analytical geometry two lines in a plane are perpendicular 
to each other if the product of their slopes is ^1. Express this statement in terms of the 
dot product of two vectors being zero. 


1.3.10. Find all vectors which are orthogonal to both U, = (1, 1,1, -1) and U; = (2,1, 3,2). 


1.3.11. IfU, = (1,1,1) and U, = (1,1, -1), are the following true? Prove your assertions by 
using the definition of linear independence. (i) U, and 2U; - U}, (ii) U; + U, and U, - U3, 
(iii) U} — U, and 2U, + 2U), (iv) U, + U, and 2U; - 2U,, are all linearly independent. 


1.3.12. Consider a subspace spanned by the vectors U} and U; in Exercise 1.3.11. Is it 
true that the sets in (i) to (iv) there, are bases for that subspace. Justify your answer. 


1.3.13. Let S be the vector subspace spanned by U; and U, of Exercise 1.3.11. Construct 
2 bases for the orthogonal complement S* of S. What are the dimensions of S and S*? 


1.3.14. Consider a 3-space and two planes passing through the origin. Consider the 
normals to these planes. Construct 3 bases for the subspace spanned by these normals 
if (1) the planes are parallel, (2) the planes are perpendicular to each other, (3) the 
planes are neither parallel nor perpendicular to each other. 


1.3.15. In Exercise 1.3.14 construct the orthogonal complements of the subspaces 
spanned in the three cases and find 2 bases each for these orthogonal complements. 


1.3.16. Let V; € S, j = 1,2,... be n-vectors where S is a vector space of dimension n. 
Show that any set of n linearly independent V;'s is a basis of S. 
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1.3.17. Let Vj €S, j=1,...,r,1<r <n -1 where the dimension of S is n and all Vps are 
n-vectors. If V;,..., V, are linearly independent then show that there exist n - r other 
elements V,,,,...,V,, of S such that V, ... , V, is a basis of S. 


1.3.18. Let S be the vector space of all 1 x 3 vectors. Let 5, be spanned by V, = (1,1,1), 
V, = (50, -2), V3 = (2,1,0) and S, be spanned by U; = (2,1, 1), U, = (3,1, -1). Show that 
(1) S, c S, S, c S, that is, Sj and S, are subspaces in S. (2) S, n S; + O, that is, the in- 
tersection is not empty. (3) Determine the dimensions of S, and S,. (4) Construct the 
subspace S; such that if W € S} then W = V + U where V € Sj and U € S. [This S c S 
is called a simple sum of S, and S, and it is usually written as S4 = S, + S2.] 


1.3.19. Consider the same S as in Exercise 1.3.18. Let 
e =(1,0,0), e,=(0,1,0), e= (0,0,1). 


Let S; be spanned by e; and e, and S, be spanned by e;. Show that (1) S, c Sand S, c S. 
(2) S1 N S, = O. (3) Construct S, as in Exercise 1.3.18. 


1.3.20. Direct sum of subspaces. Let S be a finite dimensional linear space (vector 
space) and let S, and S, be subspaces of S. Then the simple sum of S, and Sj, denoted 
by 5, + S, is the set of all sums of the type U + V where U € S, and V € S,. Note that 
S, + S, is also a subspace of S. In addition, if S4 n S, = O, that is, the intersection of 
S, and S, is null or empty then the simple sum is called a direct sum, and it will be 
denoted by S, + S,. Show that for the simple sums, 


dim(S, + S2) + dim(S, n S5) = dim(S,) + dim(S,) 


where dim(-) denotes the dimension of (-) and + the simple sum. 


1.3.21. Let 5j, j = 1,...,k be subspaces of a finite dimensional space S. Show that, for 
the simple sums, 


k 
dim(S, +- + Sk) < Y dim(S;). 
il 
1.3.22. Let S; and S, be as in Exercise 1.3.20. Then show that every element W € (S, + 
S5) can be written as W = U + V, U U€S,, V eS, and that this decomposition W = U + V 
is unique if and only if S} N S, = O where O means a null set. 


1.3.23. Let So, S1» ..., S; be subspaces ofa finite dimensional linear space S. Show that 
the subspace Sọ can be written as a direct sum of the subspaces S,,...,S, if and only 
if the union of the bases for S,,...,S, forms a basis for So. 


1.3.24. Let Sj €S, j=0,1,...,k where S is a finite dimensional linear space. Show that 


So 2S, t. $5, 
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if and only if 
k 
dim(So) = È dim(S;). 

j=l 

1.3.25. Let S}, j =0,1,...,k be as in Exercise 1.3.24. Show that 
So =S, 4-48; 
if and only if 
SiN (Si+ +S;1)=0, i=1,...,k 

where O is a null set. 


1.3.26. By using vector methods prove that the segment joining the midpoints of two 
sides of any triangle is parallel to the third side and half as long. 


1.3.27. By using vector methods prove that the medians of a triangle (the line seg- 
ments joining the vertices to the midpoints of opposite sides) intersect in a point of 
trisection of each. 


1.3.28. By using vector methods prove that the midpoints of the sides of any plane 
convex quadrilateral are the vertices of a parallelogram. 


1.3.29. By using vector methods prove that the lines from any vertex of a parallelo- 
gram to the midpoints of the opposite sides trisect the diagonal they intersect. 


1.3.30. If U,,..., Ux is a finite collection of vectors and if ||U;|| denotes the length of U; 
then show that 


Uy + ++ + Ul € IU; IUS] + +++ ULT. 


1.4 Some applications 


We will explore a few applications of vector methods in multivariable calculus, sta- 
tistical problems, model building and other related areas. The students who are not 
familiar with multivariable calculus may skip this section. 


1.4.1 Partial differential operators 


Consider a scalar function (as opposed to a vector function) of many real scalar (as 
opposed to vector) variables, f (x, ...,x,), where x,,...,X, are functionally indepen- 
dent (no variable can be written as a function of the other variables), or distinct, real 
variables. For example, 
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(i) f 22d + x9 - 3x,X) 4 x — 5x, +8 
(ii) f = 334 + 2x5 — xax; -X - 2x, + 10 


are two such functions of two real scalar variables x, and x;. Consider the vector of 
partial differential operators. Let us use the following notations: 


à 9f. 
Xx Ox, xi 
à of 
X= X2 ; ð Ox; ; EE Y - Ox, A 
: ox : : 
à à 
x; a x 
d -( d d ) 
OX! \ ax,’ ax, 7” 
of (2 A.) 
LT NN c" l 1.4.1 
vale ox! Ox, OX, MD 


For example, & £ means to differentiate f with respect to x, abite which means as- 


suming all ote variables x», ..., x, to be constants. In (ii) above 5— 2 operating on f 
gives 


o o 
qu. or Son + 2X2 - XjX4 7 X - 2x, +10) 
ð ð Ó 
- xd) + (2x8) + San) 


ð 
me x) + È a x) + te) 


= 6x, +0-x,+0-2+0 


-6x;-X,-2. 


Similarly x operating on this f gives 


of o 
3x2 + 2x5 — XX; — Xə — 2x1 + 10 
ax, EC 1 2-X1:35—X 1 ) 
=0+4x,-x,-1-0+0 
Ls 


Then & y operating on f is a column vector, namely, 


a 
d a) o 
f= of |= aces 
ox 25 4x,—-X1-1 
The transpose of this vector is denoted by & ax m operating on f). That is, 


of 
ax! 


= (6X, - x; - 2, 4x; - X - 1). 
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1.4.2 Maxima/minima of a scalar function of many real scalar variables 


When looking for points where the function may have local maximum or local mini- 
mum we differentiate the function partially with respect to each variable, equate to 
zero and solve the system of equations to determine the critical points or turning 
points or points where the function may have local maximum or local minimum or 
saddle points. These steps, in vector notation, are equivalent to solving the equation 


PP 
ox 


where O denotes the null vector. In our illustrative example 


Ff nck 6x; -x,-2 z 0 
ox 4x,—X1-1 0 


(1.4.2) 


That is, 

(a) 6x, -x;- 2-0, 
(b) 4x; -x - 1-2 0. 
When solving X = O we need not write down the individual equations as in (a) and 
(b) above. One can use matrix methods, which will be discussed in the next chapter, 
and solve (1.4.2) directly. Solving (a) and (b) we have 


x, [9/23 
x) 8/23]. 
In our illustrative example there is only one critical point 
9 8 ) 
XX) =( =, — }. 
0095) ( 23'23 
This critical point may correspond to a maximum or a minimum or it may be a saddle 
point. In order to check for maxima/minima we look for the whole configuration of the 


matrix of second order partial derivatives and look for definiteness of matrices. This 
aspect will be considered after introducing matrices in the next chapter. 


1.4.3 Derivatives of linear and quadratic forms 
Some obvious results when we use the operator a on linear and quadratic forms will 


be examined here. A linear form is available by taking a dot product of X with a con- 
stant vector. For example if 
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then 
X.a-a.X = ax + + AyXy (1.4.3) 
is a linear form. For example, 


yı = 2X, - X; + 3X3 +X4 


Y2 =X] +X] + X3 + X4 — 2X5 + 7X6 


are two linear forms. In a linear form each term is of degree one and all terms are 
of degree one each or a linear form is homogeneous of degree 1 in the variables. For 
example, the degree of a term is determined as follows: 3x° (degree 0 + 5 = 5), x? +3x3 
(each term is of degree 5), 2xi'x, (degree O + 4 4 1 = 5), 6x, (degree O + 1 = 1, linear), 
5 (degree 0, constant). 

What will be the result if a linear form is operated with the operator a ?Lety =X.a 
then 


zi a 
CM Hi NM n ee 
au cae Tere 
aj ds Lan 


Hence we have the following important result: 


(i) Consider the operator = and the linear form X.a where a is a constant vector. 
Then 
o oy 


=X.a=a.X =Y=ay = 
y=X.a=a > x ax ^ 


where a is the column vector of the coefficients in X.a. 


Example 1.4.1. Evaluate o if 
y-2X — 5X + X3 — 2X4. 


Solution 1.4.1. 


DEREN oy — 5, Y a Oy _ 2 
Ox, OX; 9X3 oX, 
and hence 
1 
3 _|-5 
ax "ax |1 
-2 


Now, let us examine a simple quadratic form. Consider the sum of squares of a 
number of variables. Let 
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X=|: then XX = xb 498 40 32, 


Xn 


This is a special case of a quadratic form. In a quadratic form, every term is of degree 
2 each or it is a homogeneous function of many variables of degree 2. For the time 
being, we consider the above simple quadratic form. More general quadratic forms 
will be considered after introducing matrices in the next chapter. What will happen 
if a sum of squares is operated with the operator 2? Proceeding as in the linear case 
the result is the following: 


(ii) Let y = X.X = x? + -- + x2 then 


1.4.4 Model building 


Suppose that a gardener suspects that the growth of a particular species of plant 
(growth measured in terms of the height of the plant) is linearly related to the amount 
of a certain fertilizer used. Let the amount of the fertilizer used be denoted by x and 
the corresponding growth (height) be y. Then the gardener's suspicion is that 


y=a+bx 


where a and b are some constants, that is, y and x are linearly related. What exactly is 
this linear relationship? The gardener conducts an experiment to estimate the values 
of a and b. Suppose that the gardener applies the amounts x,, ..., x, of the fertilizer x 
on different plants of the same species, in a carefully planned experiment, and take 
the corresponding measurements y;, ..., y, on y. Thus the gardener has the following 
pairs of values (xj, yj), i 2 1, ..., n. For example, when one spoon of fertilizer (measured 
in spoon units) is applied the growth (measured after a fixed time) noted is 3 inches 
(growth measured in inches) then the corresponding pair is (x,, y,) = (1,3). If y = a + bx 
is a mathematical relationship then every pair (x, y) should satisfy the equation y = 
a + bx. Then we need only two pairs of values on (x, y) to exactly evaluate a and b 
and then every other value on (x, y) must satisfy the relationship. But this is not the 
situation here. The gardener is thinking that there may be a relationship between x and 
y, that relationship may be a linear relationship and that she will be able to estimate y 
ata preassigned value of x. Then the error in estimating y by using such a relationship 
at a given value of x is y - (a + bx). Denoting the error in the i-th pair by e; we have 


€; 2yj- a - bx;. 
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One way of estimating the unknown parameters a and b is to minimize the sum of 
squares of the errors (error = observed value minus the modeled value, whatever be 
the model, linear or not). Such a method of estimating the parameters in a model by 
minimizing the error sum of squares is known as the method of least squares. The error 
vector and the error sum of squares in our linear model are given by 


€ 
e=| : pb 
€n 
n 
ecce 6 =} (y;-a- bx. (14.4) 


tel 


Equation (1.4.4) can be written in a more elegant way as a quadratic form after dis- 
cussing matrices. Let the vector of unknowns be denoted by a = (1). Then the method 
of least squares implies that e.e is minimized with respect to a. It is obvious that the 
maximum of e.e, being a non-negative arbitrary quantity, is at co. Then the minimiz- 
ing equations, often known as the normal equations in least square analysis, are the 
following: 


ə 2 
—(ee)=O0 2 (3)eo-o => 


oa 3b 
-2Y i-a - bx) _ (0 3 
-2Y?tix(y;-a-bx)) \0 
Yi -a-bx)-0 (a) 
i=l 
and 
$ x;(yj - a - bx) = 0 (b) 
i= 


since -2 + 0. Opening up the sum we have, from (a) and (b), 


(Zr) -na-o( Ys) x (9 


(Yon) (Y) (9t) -o (d) 


and 


Denoting 
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and solving (c) and (d) we get the values of a and b. Let us denote these estimates by 

à and b respectively. Then we have 

Xay- 9) - (ER pay) - ny) 
X406 3)" Qo) — no 


b- (1.4.5) 
and 
a=y - bx. (1.4.6) 


From (1.4.5) and (1.4.6) we have the estimates for a and b, and the estimated linear 
model by using the method of least squares is then 


y=a+bx. (1.4.7) 


Example 1.4.2. In a feeding experiment with beef cattle the farmer suspects that the 
increase in weight is linearly related to the quantity of a particular combination of 
feed. The farmer has obtained the following data. Construct the estimating function 
by the method of least squares and then estimate the weight if the quantity of feed is 
2.2kg. 


y= (gain in weightin kg) 0.5 0.8 15 2.0 


Data: 
x = (quantity of feed in kg) 1.2 15 20 25 
Solution 1.4.2. 
R= 1241.5 S +25 _ 18, y- 0.5 + ^ 154-20 _ 12. 


For convenience of computations let us form the following table: [Use a calculator or 
computer to compute à and b directly.] 


x y x-x y-y (x-xy» (x-xX(y-y) 


12 05 -06 -7 036 0.42 
15 08 -03 -04 0.09 0.12 
20 15 02 03 004 0.06 
25 20 07 O8 049 0.56 
0.98 1.16 
p= 116 Lo 1837, ds x 10) (1.8) = —0.9306. 
0.98 (0.98) 


The estimated model is 
y = -0.9306 + 1.1837x. 
Then the predicted value of y at x = 2.2 is 


j = -0.9306 + 1.1837(2.2) = 1.6735 kg. 
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Exercises 1.4 


1.4.1. Find the critical points for the following functions and then check to see 
whether these correspond to maxima or minima or something else: 

(a) f 22x] a 3x; + 5X3X3 — X4 4 5. 

(b) f =x? «xi - 2x x, - 5x, - 2x, 4 8. 


1.4.2. Evaluate JT and write the results in vector notations: 
(a) f 23x, -X * 5x3 - x, + 10. 

(D f =x? + 2x1X + XX3 - X2 + 3X5. 

(c) f 22x; 433 433 = 5X1X + XX3. 


1.4.3. Write the operator Ex . Then on each element of this vector apply the operator 
2n Explain what you have in this configurations of n rows and n columns. 
1.4.4. Apply the operator 2 E on f in each of (a), (b), (c) in Exercise 1.4.2. 


1.4.5. Fit linear models of the type y = a + bx for the following data: 
(a) (xy) = (0,2), (1,5), (2,6), (3, 9)}. 
(b) (x, y) = {(-1, 1), (-2, -2) (0, 3), (1, 6)}. 


1.4.6. Fit a model of the type y = a + bx + cx? to the following data: 


(x, y) = {(-1, 2), (0, 1), (1, 5), (2, 7), (3, 21)}. 


1.4.7. Instatistical distribution theory the moment generating function of a real vector 
X! = (6,..., xy) random variable is denoted by M(T), T' = (t, ...,t,) where T is a vector 
of parameters. When M(T) is evaluated for the real multivariate Gaussian distribution 
we obtain 


M(T) =e?) 
where 
G(T) -üu te + thet 5 ab 2, Cj; J 
ij=1 
where j4, ..., Mg as well as 0;;, i= .,k, j=1,...,k are constants, free of T. When M(T) 


is available and differentiable, m the expected value of X or the first moment of X, 
denoted by u = E(X), is obtained as 2M (T) r20; mhar is the first derivative evaluated 
at T = O, and the variance-covariance matrix is 3. 3r 2M (T)lr-ọ — My’. Evaluate E(X) 
and the variance-covariance matrix for the multivariate Gaussian distribution. 


1.4.8. The exponential series is 


pF. e, yal 
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Consider the operator D = Mn Then 


2p? 
e? - (xp) , XD , X H 
1! 2! 


where, for example, D" = DD ... D stands for D operating repeatedly r times. Let e? fo 
denote e? operating on f and then D'f is evaluated at x = 0, r=0,1,.... Then 


enro (a) aa) gt 


This is Taylor series in one variable. Now consider a two variable case. Let 


v(i): pe, i=1,2 
D; OX; 


l 


and the increment vector at the point (a,,a5) is A’ = (x, - a4,x; — a2). Then the dot 
product is given by 


V.A = (4 —a))D, + (X; - a5)D,. 


As before, let e"^f, denote eV^ operating on f where the various derivatives are eval- 
uated at the point (a4, a;). Write down the Taylor series expansion for two variables 
(3, X5) at the point (a,,a) explicitly up to the terms involving all the second order 
derivatives. 


1.4.9. By using the operator V in Exercise 1.4.8 expand the following functions by us- 
ing Taylor expansion, at the specified points: 

(a) x? + 2xpd «xi 54 x; 4 7 at (1,1). 

(b) 2x? 4 xi - 33x; +8 at (22, 3). 

(c) xf 31x 3x] - XxX + 4 at (2,0). 


1.4.10. Extend the ideas in Exercise 1.4.8 to a scalar function f (x3, X2, X3) of 3 real vari- 
* s à o: 

ables x,,x>,X3, at the point (a,,a5, a5). In this case D; = ax, P= 1, 2,3. Evaluate the first 

few terms of the series explicitly up to the terms involving (V.A)?. 


1.4.11. Apply the result in Exercise 1.4.10 to expand the following function up to terms 
involving (V.A)?, and at the point (1, 0, -1): 


2 


xje 775 4 5y3y2x, — C1132, 


1.4.12. For Exercise 1.4.5 (a) estimate y at (i) x = 2.7, (ii) x = 3.1. Is it reasonable to use 
the model to estimate y at x = 10? 


1.4.13. For Exercise 1.4.6 estimate y at (i) x = 0.8, (ii) x = 3.1. Is it reasonable to predict 
y at (iii) x = —4, (iv) x = 8 by using the same model? 
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1.4.14. Use the method of least squares to fit the model 
Y = Ag + 44X? + aXX; + 5X5 
to the following data: 
(X1, X5, y) = (0, 0, 1), (0,1, 0), (0,2, —2), (1, -1, —1), (2,1, 8), (1,2, 3). 


1.4.15. Can the method of least squares, as minimizing the error sum of squares with 
error defined as “observed minus the modeled value”, be used to fit the model y = ab* 
to the data 


(x,y) = (X1,¥1), (X2,Y2)> t. (XnYn) 


and if not what are the difficulties encountered? 


2 Matrices 


2.0 Introduction 


One of the most elegant tools in simplifying matters and dealing with systems of ob- 
jects is the entity called a matrix. Plural of the word matrix is matrices. Suppose we 
have a set of mn objects, such as mn real numbers, and if these objects are arranged in 
m rows and n column we get the configuration called a matrix. For example if 6 num- 
bers are arranged in 3 rows and 2 columns we get a matrix, if the same numbers are 
arranged in 2 rows and 3 columns we get another matrix, one row and 6 columns we 
get a third matrix and so on: 


-1 2 
A,= l | - amatrix, 


1 4 0 
5 1 

A,=|-1 4] = another matrix, 
2 0 


A3;=[5 1 -1 4 2 O0]= another matrix, 


5 -1 2 1 
A, = = not a matrix. 
eot 
In the last representation two positions are empty and hence it is not a matrix. Here 
A, has 2 rows and 3 columns whereas A; has one row and 6 columns. 


Notation 2.0.1. If a matrix has m rows and n columns it is called an m x n (m by n) 
matrix. 

Here m represents the number of rows and n represents the number of columns. 
In our illustrative examples A, is 2 x 3 (not 6) matrix, A, is a 3 x 2 matrix and A, is 
a1x6 matrix. The symbol x (cross) simply separates the numbers m and n and it is 
not used as a multiplication symbol in this notation. m.n or m « n are not appropriate 
notations in this respect. Obviously, a 1 x n matrix is a row vector of n elements and 
an n x 1 matrix is a column vector of n elements. Thus all items in Chapter 1 become 
special cases of the various properties of matrices. 


Example 2.0.1 (Grades of students). Let the following tables give the grades, in per- 
centages, obtained by 3 students in four class tests in two courses: 


Course 1 


test 1 test 2 test3 test4 


Student1 80 85 90 82 
Student2 65 60 70 72 
Student3 75 72 74 78 


@ Open Access. © 2017 Arak M. Mathai, Hans J. Haubold, published by De Gruyter. [e>) Exaieaeall) This work is licensed 
under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. 
https: //doi.org/10.1515/9783110562507-002 
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Course 2 


test 1 test 2 test3 test4 


Student1 99 92 90 95 
Student2 60 62 65 63 
Student3 80 72 77 81 


The number at each position signifies something. The first row second column entry 
for Course 1, namely 85, is the grade of student 1 in test 2 in course 1. Thus the en- 
tries cannot be arbitrarily interchanged. The interchanged arrangement will signify 
something different from the original arrangement. 


A convenient notation that can be used to denote a matrix is by denoting the ele- 
ment at the i-th row, j-th column position in a matrix A by aj. In this case we write a 
general matrix in the form 


a, Ay Ain 
a a we a 

A-(q)-| 7 P s (2.0.1) 
am Am2 Amn 


The elements are enclosed by square brackets [-] or by ordinary brackets (-). The nota- 
tion A = (a;j) means the matrix where the i-th row, j-th column element or the (i,j)-th 
element is a;j for all i and j. In Example 2.0.1 for course 1, for example, 


44,780, a4-290, a»=60, a=72, 44-75 d4,-78. 


The elements are usually separated by spaces. If there is possibility of confusion then 
the elements in the configuration are separated by commas. When some of the ele- 
ments are numbers, some are long expressions involving some variables etc there is 
possibility of confusion. In this case we will separate the elements by commas. 

If the matrix of grades in course 2 is denoted by B = (b,) then 


80 85 90 82 99 92 90 95 
A-|65 60 70 72], B-|60 62 65 63 
75 72 74 78 80 72 77 81 


For example, b,3 = 90, by, = 63, b4, = 72. Here a, = 80 + bj, = 99 whereas a3 = 72 = 
b3). If the student had exactly the same profiles of grades in the two courses then the 
corresponding entries would have been all equal, that is a; = bj; for all i and j. 


2.1 Various definitions 


A lot of technical terms and definitions will be introduced, all at once, since they can 
be recognized easily and the properties can be memorized without difficulty. 
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Definition 2.1.1 (Equality of matrices). Two matrices A and B are said to be equal if 
(i) they are of the same category, that is, if A is m xn then B has to bem xn, (ii) element- 
wise they must be equal, that is, a; = bj; for all i and j. 


Example 2.1.1. Let 


el -1 Ji »- ^ b Al 
0 3 x 0 3 5 
Here both C and D are 2 x 3 matrices. C = D if and only if (iff) 1 = a, -1=b, x = 5. If all 


elements in C are equal to the corresponding elements in D, except for one pair, still 
C 2 D. 


In Example 2.0.1 if the grades are to be represented out of 20 points each, rather 
than as percentages, then each grade is to be divided by 5. Then the whole configura- 
tion of grades is to be divided by 5, or each element there, is to be divided by 5. In this 
case we say that the whole matrix is divided by 5. For example, for A, the configuration 
of grades out of 20 points will then be 


80 82 

Be "Qe. ceps "s 1 80 85 90 82 1 
S $ D» 2|--|é 60 70 72|--A 
75 

5 


5 
2 
5 5 5 
72 74 78 75 72 74 78 


5 5 5 


Definition 2.1.2 (Scalar multiplication of matrices). 
cA T (cay) 
where c is a scalar quantity (1 x 1 matrix). 


That is, if every element in A is multiplied by c then we say that the matrix A is 
multiplied by c. As a convention the scalar quantity c is written on the left of A as cA, 
and not on the right as Ac, when writing a scalar multiple of A. 


Example 2.1.2. Let 


Then 


24-12 2|, a- 1], oa- °l. 
4 10 -2 -5 0 0 
Definition 2.1.3 (A null matrix). If all the elements in a matrix are zeros it is called a 
null matrix and it is denoted by a big O. 
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Definition 2.1.4 (Square and rectangular matrices). In an m x n matrix if m - n, that 
is, the number of rows is equal to the number of columns, then it is called a square 
matrix. Non-square matrices are called rectangular matrices: 


1 -1 
l 5 | = a2x2square matrix, 


2 1 5 " 
E 1 - a2x3rectangular matrix, 


[1 -1 2 3]- a1x4rectangular matrix or a row vector, 


1 
| | = a2x1rectangular matrix or a column vector. 


Note that vectors of Chapter 1 are all either 1 x n (row vector) or n x 1 (column vector) 
rectangular matrices. 


Definition 2.1.5 (A diagonal matrix). In an n x n square matrix A = (aj) if aj =0 for 
alli and j, i + j and at least one aj #0, i=1,...,n then A is called a diagonal matrix. 


That is, the matrix has to be square, and all elements other than the ones on the 
leading diagonal (the diagonal starting from the top left corner and going down; ina 
square matrix this diagonal ends at the bottom right corner) are zeros and there is at 
least one nonzero element on the diagonal. If all the elements on the diagonal are also 
zeros then obviously it is a null matrix. A null matrix is not counted among diagonal 
matrices even if the null matrix is a square matrix. For example, 


0 

1 0] fo o 
of do ap [o 5 
1 


are all diagonal matrices. A convenient notation for a diagonal matrix is the following: 


[e e» Me] 
ON o 


D - diag(d,, ...,d,) (2.1.1) 


which means D is a diagonal matrix with the diagonal elements d4, ..., d, respectively 
or to be more specific, to indicate rows and columns, we may write 


D = diag(d,, ...,d,,). 


Definition 2.1.6 (A triangular matrix). A square matrix with all elements above the 
leading diagonal zeros (there may be some zeros on the diagonal and below the di- 
agonal also) then it is called a lower triangular matrix and if all elements below the 
leading diagonal are zeros then it is called an upper triangular matrix. (A null matrix 
is not counted as a triangular matrix or a diagonal matrix.) 
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For example, 
1 0 O 7 1 0 
3 0 O0! is lower triangular, O -1 1|isuppertriangular. 
-1 1 5 0 0 2 


Definition 2.1.7 (Identity and scalar matrices). A diagonal matrix with all diagonal 
elements equal to d, d #1, d + 0, is called a scalar matrix and if d = 1, that is, all 
diagonal elements are equal to 1 then it is called an identity matrix and an identity 
matrix is denoted by I, or I, if the order is to be indicated that it is an n x n ma- 
trix. 


For example, 


oOooo-2 
oor o] 
oOo- oco 
Pe OO O° 


Definition 2.1.8 (The transpose of a matrix). If the i-th row of an m x n matrix A is 
written as the i-th column for alli = 1, ..., m then the new matrix thus obtained is called 
the transpose of A and it is usually denoted as A’ (A prime) or AT (A transpose), trans- 
pose of the matrix A. 


Thus when A is m x n then A’ is n x m. For example, 


1 1 1 ES 
a=] ET 1 -1]; 
1-12 
1 2 
0 
B=(0,1,5) > B’ =| 1 |; 
5 


eaa) > C'=(1_-1). 


(i) The transpose of a 1 x 1 matrix (scalar quantity) is itself. 


Definition 2.1.9 (A symmetric matrix). Ifa square matrix A = (a;j) is such that aj = aj, 
that is, the element in the (i, j)-th position is equal to the element in the (j, i)-th posi- 
tion for all i and j then A is called a symmetric matrix. That is, if A is symmetric then 


A=A'. 
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For example, 


1 7 A se 2 0 
isl d B-|2 O0 -4|-B' = ek 
3 -4 7 


The following properties are immediate: 


(ii) I =T'; D = D' (D a diagonal matrix); (lower triangular)’ = upper triangular and 
vice versa; O' = O (when O is a square null matrix). 


Definition 2.1.10 (A skew symmetric matrix). If a square matrix A = (ai) is such that 


aj = -aj for all i and j then A is called a skew symmetric matrix. That is, A’ = —A. 


For example, 


a-|? J = A' --A 


4 0 
0 1 3 

B=|-1 0 -2| = P'--B. 
-3 2 0 


Note that when A = (aj) is skew symmetric then aj; = -aj; which means aj = 0. That is, 
all the diagonal elements are zeros. 


(iii) All the leading diagonal elements of a skew symmetric matrix are zeros. 


Example 2.1.3 (Consumption profiles). Suppose that the following tables give the 
quantities (all in kg (kilograms)) of food items consumed by 3 families over two 
weeks. 


week 1 


beef pork chicken beans 


family1 10 5 10 10 

family2 8 7 8 10 

family3 10 15 10 12 
week 2 


beef pork chicken beans 


family1 10 15 15 5 
family2 8 10 10 12 
family3 12 15 12 10 
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The two matrices of weekly consumption, for all the 3 families, are therefore 


10 5 10 10 10 15 15 5 
A-|8 7 8 10 and B=] 8 10 10 12 
10 15 10 12 12 15 12 10 


If we want to find the profile of total consumption in the two weeks together then we 
add the corresponding elements. That is, 


104-10 5-415 10-15 10-45 
A+B=]| 8+8 7+10 8+10 10+12 
10-12 15415 10+12 12+10 


20 20 25 15 
-|16 17 18 22 
22 30 2 22 


Definition 2.1.11 (Sum of two matrices). It is defined only for matrices of the same cat- 
egory, both must be m x n (same m, same n for both). Let A = (aj j) and B= (bj ;j. Then 
the sum is defined as 


A+B= (ay + by) 


or the matrix obtained by adding the corresponding elements as in the illustrative 
example. 


For example, 


(1,1, -1) + (2,0, 1) = (3,1,0); 


(1,1,—1) + 


not defined; 


EERST 
2 -1 0 MAE 


N e 
diris 
oo 
| 


8 8 
1 J+{ 1 J={ 2p 
-1 2 1 
5.2 =l 0 1 0 5 3 -1 
10 -1/+]0 2 0/2|1 2 -1 
2 2 4 0 O 1 2- 2-3 
0 1 0 5 2 -1 
=|0 2 Oļj+|1 0 -1 
O 0 1 2. 2: "4 


66 —— 2 Matrices 


We can extend this definition to any number of matrices of the same category. Com- 
bining with the definition of scalar multiplication we can define a linear function of 
two matrices (or of several matrices) of the same category. That is, 


QA + pB = (aaj + pbi) (2.1.2) 
where a, f are scalars and A = (aj) and B = (bj). That is, the corresponding linear func- 


tions of the elements are taken. Now, we can establish the following properties easily. 
For matrix addition we use the symbol +. 


(vi) -A-(-DA; A-A=0; A+O=A; 
A+B=B+A; A+(B+C)=(A+B)+C; 
a(A + B) - aA * aB (2.1.3) 


where a is a scalar. 


Example 2.1.4. In Example 2.1.3 if the price per kg for beef, pork, chicken and beans 
for week 1 are respectively (2,1, 0.5,3) dollars and those for week 2 are respectively 
(2.1, 1.2, 0.8, 3.2) dollars then construct the expense profiles for week 1 and week 2 for 
the 3 families. 


Solution 2.1.4. If the price vectors are 


21 
12 

U= d V= 
o5 m5 0.8 
3 32 


respectively then the money value, in dollars, ofthe expense profiles for the two weeks 
are the following: For the first week, writing it as AU, 


2 

10 5 10 10 1 
AU-|8 7 8 10 05 
10 15 10 12 


(10)(2) + (5)(1) + (10)(0.5) + (10)G) 
(8)(2) + 7)(1), (8)(0.5) + (10)(3) 
(10) (2) + (15)(1) + (10)(0.5) + (12)(3) 
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and that for the second week, writing it as BV, we have 


2.1 

10 15 15 5 12 

BV-|8 10 10 12 68 
12 15 12 10 i 

3.2 


(10)(2.1) + (15)(1.2) + (15)(0.8) + (5)(3.2) 
(8)(2.1) + (10)(1.2) + (10)(0.8) + (12)(3.2) 
(12)(2.1) + (15)(1.2) + (12)(0.8) + (10)(3.2) 
67 
= | 75.2 
84.8 


The total expenses for the two weeks combined is then 


60 67 127 
57 | + | 75.2 | = | 132.2 
76 84.8 160.8 


Some sort of multiplication and addition of matrices is involved in calculating the 
combined expense profile of the three families for two weeks. We will define matrix 
multiplication in a formal way. 


The matrix A = (aj) postmultiplied by B = (b;j), denoted as AB (or B premultiplied 
by 4) is defined only when A and B are of the following types: A is mxnand Bisnxr, 
that is, the number of columns of A is equal to the number of rows of B. For example, 

ifAis2x5,Bis5x4 then AB is defined but BA is not defined; 

if A is 3 x 3, Bis 3 x 3 then AB is defined and BA is also defined; 

if Ais 1xnand Bis nx 1 then AB and BA are defined; 

if Ais 3 x 4 and Bis 2 x 3 then AB is not defined but BA is defined. 
Definition 2.1.12 (Multiplication of matrices). Let A = (aj) bemxnand B- (by) be 


n xr then AB is an m xr matrix where the (i,j)-th element in AB is the dot product of 
the i-th row vector of A with the j-th column vector of B. 


The i-th row vector of A, denoted by aj, is the following: 
Qi Ff (ay, Ain» nn ain). 


The j-th column vector of B, denoted by £}, is 
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Then writing element-wise multiplication and addition as 


ap; = (ay, 5 din) 
byj 
= Aj Dy + apb»j + + Andy 
n 

= $ aij. 

k=1 

If AB is denoted by AB = C = (cj) then 
n 
Ci = ap; = y ag by. (2.1.4) 
k=1 


Symbolically the multiplication can be expressed as follows: 


——_—_—__—$_ 


———— 


The first row of A dot product with the various columns of B gives the first row 
of C = AB, the second row of A dot product with the various columns of B gives the 
second row of C, and so on. 


Example 2.1.5. Evaluate the product of the matrices A and B to obtain AB, wherever 
AB is defined: 


0 
1 
(a) A=(1 -1 1 2, B= 5 |} 
-1 
1 
O 1 -1 2 2 
(b A= , B- i 
-1 1 5 4 -1 
0 
2.1 -1 -1 1 
(c) A-|3 1 1|, B=|1 -1/|; 
11 1 0 0 


ME 
es 


(d | A- 


PWN 
w 
Il 

q—Ó 
N 
=. 
— 
N 

Ll 
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(e) A=I3;, B= (bij); 


(f A- [s : 3! B= (bj) is3 x3. 


Solutions 2.1.5. 


(a) AB=(1 -1 1 2) 
-1 
= (1)(0) + (CD(1) + DM) + 2)(-1) = -1. 


1 
(b) Pe ee 2 
“1-1 1 5 4||-1 

0 


where A is 2x 4 and Bis 4 x1. Thus AB is 2x 1 which can be remembered symbolically 
as2x1-22x (4:4) x1. The first row element in AB is 


1 
(01 -1 2 t = (0)(1) + (1)(2) + (-1)(-1) + (2)(0) 23 
0 
and the second row element in AB is 


1 


(-1 1 5 4) P = (-1)(1) + (10) + (5)C-1) + (4)(0) = -4. 


Hence 


2 1 -1||-1 1 
(c) AB=|3 1 1 1 -1 
1 1 1 0 0 


(2)(-1) + ()(1) + (-1)(0), (2) + (00(71) + (-1)(0) 
=| GX-0- (00) + (0(0), G)(1) + D-1) + (0(0) 
(1)(-1) + OO + (0(0), 0A) + (0(7) + (1(0) 
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(d) Since A is 3 x 3and Bis2x 4, AB is not defined. 
(e) Here A is an identity matrix and B is a general matrix. 


1 0 0||bi bp bg 
AB-|O 1 Oj||b4 b» b» 
0 0 1]j[b4 by bg 
by bp bg 
=|by by ba4|-B 
ba b; bg 


o o ol|?: Pe bs 0 0 0 
ba by by; = 000 = 0. 


It is interesting to observe the following general properties. As long as the products 


are defined 


(v) (A)! =A; IA=A; BI=B; OC=0; DO=0; AB + BA 
ABC + ACB (2.1.5) 


where A, B, C, D are arbitrary matrices and J and O denote the identity matrix and 
the null matrix respectively, and when the products are defined. 


Also note that if X is an n x 1 column vector then 


x, x, 
: ! 
X=]: | > X'X=(%,...,X,) 
Xn Xn 
=X? 4x34... tX (2.1.6) 
whereas 
x, 
ri . 
XX' =| : | (%,...5Xy) 
Xn 
X X XıXn 
2 
[ox X oe XXn 217) 
2 
XgXQ Xn% e. Xa 


That is, X'X is a scalar (1x 1 matrix) whereas XX' is an n x n matrix. 


2.1 Various definitions —— 71 


Example 2.1.6. Write the following systems of linear equations in matrix notation: 
(a) 2x4 -X5 4X4 24 
Xi +X% -X3 =2; 
(b) 3x; +X% +X% =1 
X; -2% +X% 23 
2X1 + X2 = X3 = 25 
(c) | 5q-X,*x-x4,-0 
2X1 + X; - 3X3 + x, =0. 
Solution 2.1.6. (a) The first equation in the first set can be written as 


x 
Q -1 1)( x, |-4 


and the second as 


and combining the two we have 


2 c wp 4 
X% |= or AX=b 
1 -1 1 2 


where the coefficient matrix is A, 


3 1 1 
A-|1 -2 1 
2 1 -1 


X=| x and b= 


N UU e 
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(c) Writing the two equations together as AX - b we have 


x 

pale ck Wo xe 
2 1 -3 1 X; 
Xy 


where O is a null vector. 


Example 2.1.7 (Linear forms). Write the following linear forms (all terms are homo- 
geneous of degree 1 each) in matrix notation: 


(a) Y =X, — 3X) + X3 

(b = y2X,-X;-X; + 2x, 

(c) yi = 2X1 - X; t X3 
yo-Xi tX) 


(d) yi = dX, + 5X3 +... aX, 


Ym = GmiX1 + Am2X2 + --. + AmnXn 


where a;;’s are constants. 


Solution 2.1.7. 
X 1 
(à. -yse(1-30D| x]e505x5x3| -3 |=a'X=X'a 
X3 1 
where 
1 X 
a=| -3 and X=| x, 
1 X3 


1 X 

-1 x 
a= and X-|'? 
-1 X3 
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(c) Writing the two linear forms together as one equation Y = AX we have 
2 -1 1 
ye Pe Seals 
y? 1 1 0 
(d) Writing the m linear forms together as one equation Y = AX we have 
n a, an tee Qin x, 


Ym Gm m2 ties mn Xn 


The representation in (d) is a general linear form or it can also be considered as a linear 
transformation of the vector X into the vector Y. It is linear in the sense that each y; is 
a linear function (of the first degree in every term) of x,,...,X,- 


Example 2.1.8 (Quadratic forms). Write the following quadratic forms (all terms are 
homogeneous of degree 2 each) in matrix notations: 


(a) =x? 4x3 4x2 
VHX +A + X3 


(b  ysxj-xbsxj 
(OQ  ys2xlexi-xie 5xx,-2X3 + X3X 
(d  y-Xb*4xpo- 20x; 
n n 
(e) y- > aux? +2 >. agX;Xj = $ aXX. 
i=1 ig ij 
Solutions 2.1.8. This is a simple sum of squares. 
xy 
(a) y=X'X, X-2|x 
X 
(b) Here the coefficients of x7,x3 and x3 are different. This format can be created 
by a diagonal matrix. That is, 


Xi 1 0 0 
y=X'AX, X-|x|. A=|0 -1 0 
X3 0 0 1 


(c) Here the product terms are also present. Hence A has nonzero off-diagonal 
elements. 
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Note that the coefficient of x,x, is ay) = 5. But xix; = x;xi. Hence we could have written 
5 as a, instead as a}, or we could have distributed 5 equally to a, and a5, that is, 
Ay = Ay = 3 which is a symmetric format. Writing a symmetric format for A we have 
the same quadratic form written as 


5 
2 3 -1 
y=X'AX, A=|3 1 j 
-1 5 -1 


In this format, the diagonal elements of A are the coefficients of the square terms, that 
is a; is the coefficient of 7, and the non-diagonal elements are the coefficients of the 
corresponding product terms where the coefficients are distributed equally as aj = aj, 
that is the coefficients of x;x; as well as that of x;x; is (a; + a;;)/2 in order to write A as 


asymmetric matrix for elegance. 


X 0 2 0 
(d) y=X'AX, X=|x% |, A=]2 1 -1 
X3 Oo -1 0 


Here A is written as a symmetric matrix. We could have written the quadratic form 
in many different ways if we did not want A to be symmetric. For example, the same 
quadratic form 


y-X'BX - X'CX 


where 
0 4 0 Oo 1 O 
B-|O 1 -2 and C=/3 1 O 
0.0 0 0 -2 0 


The following is a general quadratic form: 


x, 
() yeXAX, X2 i|. A-(aj) 


Xn 


and without loss of generality we can take A to be symmetric, that is, A =A’. 


(vi) The matrix A of the quadratic form X' AX can be taken as symmetric, that is, 
A = A! , without loss of generality. 


If A is not symmetric then the quadratic form can be rewritten equivalently as 
1 
X'AX-X'BX, whereB- ;4 t A!) - B. 


Thus B is symmetric. 
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The following general properties can be observed for transposes and products. 
The student may verify them by taking arbitrary 3 x 3 matrices: 


(A')! =A; (A+B)! =A’ +BY; (AB)! - B'A'; (AA')' = AA’; 
I! - I; O' =0; (AA; ...Ay)! = AL... ASAI. (2.1.8) 


Whenever the sums and products are defined, 
the transpose of a lower triangular matrix is upper triangular; 
the transpose of an upper triangular matrix is lower triangular; 
the transpose of a diagonal matrix is diagonal. 


Whenever the product is defined, 
the product of two identity matrices is an identity matrix; 
the product of two diagonal matrices is a diagonal matrix; 
the product of a null matrix with any other matrix is null; 
the product of two lower triangular matrices is lower triangular; 
the product of two upper triangular matrices is upper triangular. 


Whenever the sum is defined, 
the sum of two lower triangular matrices is lower triangular; 
the sum of two upper triangular matrices is upper triangular; 
the sum of two diagonal matrices is diagonal; 
the sum of two identity matrices is a scalar matrix; 
the sum of two symmetric matrices is symmetric; 
the sum of two skew symmetric matrices is skew symmetric. 


2.1.1 Some more practical situations 


Many situations from various disciplines can be listed where systems of entities can 
be written in nice elegant simplified forms with the help of matrices. A few more situ- 
ations will be listed here where only the sums and products of matrices are involved. 


Example 2.1.9 (The Jacobian matrix). Consider the following system of linear equa- 
tions: 


yı = 2X, +X — X3 + X, 
yo =X] + 3X; + X3 + 2X, 
V3 = — + X2 + X3 = X4 


ya 7 Xi +X * X3 +X; 
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which can be written in matrix notation as, 


yi 2 1 -1 1 X 
13 1 2 x 
Y- AX, Y- y2 > A= > X= 2 
Y3 -1 1 1 4&4 X3 
Yy 1.1 1 1 X4 


Consider the partial derivative of y; with respect to x; for all i and j and let the matrix 
of these partial derivatives be denoted by 


oy, oy 
oY _ (2: ) m: x, 
Ox, h DG 


Then the matrix 24 is called the Jacobian matrix of this transformation X to Y. Evaluate 
the Jacobian matrix in the above transformation. 


Solution 2.1.9. 
Yi i ioi ug E e 
OX, OX) Ox3 OX; 
Then 
2 1 -1 
oY Oy; 1 3 1 2 
coe ie =A= 
ox OX; -1 1 1 -1 
Yt 4 1 
the coefficient matrix in Y = AX. 
Instead of linear functions if 
yi2fi9a ox) 1=1...,k 
then the Jacobian matrix is still 
on (2) 
OX \ dx; 
where ov is the partial derivative of y; with respect to x;. The Jacobian matrices are 
J 


relevant only when the number of x,,...,x;,, that is k, is the same as the number of 
y» o yj, that is k. These numbers are equal, and further, we should be able to write 
Xi» ..., Xy uniquely in terms of y,,...,y, and vice versa. In this case we say that there is 
a one-to-one transformation. 
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Example 2.1.10 (Derivative of a quadratic form). Let 
u = 3x? +X — 2x3 - 2x1x; + XX; 


and consider the differential operator discussed in Chapter 1, namely, 


ð OX, 
ar = : . (2.1.9) 
ð 
OX, 


= ou 
Here n = 3. Evaluate 5. 


Solution 2.1.10. Writing A as a symmetric matrix 


3 -1 Í 
u=X'AX = X'=(x%,.X%»X%), A=|-1 1 0 
1 
1 0 -2 
ðu 
du Ox, 6x, — 2X, * X3 
ou 
ax Ox; = —2X, + 2X; 
ou. X, — 4x 
Ox; 1 3 


1 
3 -1 511% 
=2|-1 1 0O]||x|-2AX 
1 
5 0 -2) |x 


This, in fact, is a general result. 


(vii) Let X be a k x 1 vector of real variables, A — A' a symmetric k x k matrix of 
a 


constants, 5 the k x 1 vector of partial derivative operator then 
i i ou 
u-X AX, A=A, > — =2AX, (2.1.10) 
ox 
and 
se ee E (2.1.11) 
OX aX! 


If u = X' AX where A is not assumed to be symmetric then 


u-X'AX = M - (A AX (2.1.12) 
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and 
eae (2.1.13) 
Before concluding this section we will define powers of square matrices. 


Definition 2.1.13 (Powers of a square matrix). Let A bean n x n matrix. Then the r-th 
power of A for r = 0,1, ... (non-negative integers) is defined as 


A'-AA--.A 
(product taken r times), with A? =I. 
p 


As examples, 


d, 0 d" o 
^-[S au)" ^ro ae] 


Definition 2.1.14 (Idempotent matrices). If A = A? then A is called an idempotent ma- 


trix when A is non-null. 
As examples we have, 
(i) I-I^- Iisanidempotent matrix. 


Consider the n x 1 vector of unities, denoted by J. Then 


1 jo ue cH 
J=| : qpesxj eae. Jis : 
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Let A = 1JJ' then 


i) æ- ya =(2) sy =i = 


1 
n 
0 0 o 0|jO O 0 0 
(i) A= , A - - A. 
1 1 1 1[||1 1 1 1 
Thus A is idempotent. We can construct many such examples of idempotent matrices. 
Definition 2.1.15 (Nilpotent matrix of order r). If a matrix B + O is such that B” = O, 
B'-! + O, for some fixed r, r = 2,3,... then B is called a nilpotent matrix of order r, 


where r is the smallest integer where B” becomes null. 


We can construct many examples of nilpotent matrices of various orders. For ex- 


ample, 

0 1 3 fo ot) [O- 1] fo o 

ü B-19 EL alle a |" |o NEC 
nilpotent of order 2; 


(ii) Cz | 


Exercises 2.1 


2.11. Compute 2A - B + iC for the following matrices, wherever it is defined: 


(a) A=(1,-1,2,3), B=(2,5,0), Cz(-L1) 


2 0 
0 0 
by. visu 2a Se es 
(b) ( 3) 1 adus 
E 0 
ied. 2d 0 12 0 4 6 
©) a-l 1 al sel 1 2 ir ei dr 


1 0 ona 1 0 
(d) a=; Al sl al zd 1 


24.2. Compute AB and BA, wherever defined, for the matrices A and B in Exer- 
cise 2.1.1. Are AB - BA in general? 


2.1.3. By taking an arbitrary m xn matrix A = (aj) show that OA = O, AO = O whenever 
the products are defined, where O indicates a null matrix. 
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2.1.4. By taking an arbitrary m x n matrix A = (aj) show that I,,A = A and AI, =A 
where Im and I, are mx m and n x n identity matrices. 


2.1.5. Constructa2x2 matrix A such that (a) A? = O but A + O, (D A? =A, A +0, A #1. 


2.1.6. Consider a general 2 x 2 matrix A and consider 
O 1 O0 -1 
P, = B P, € > 
1 0 1 0 
O 1 0 -1 
P} = ys ; 
e y rm o) 


(a) Premultiply A with each of P,,...,P, and explain in each case what happens to 
the matrix A. 


(b) Postmultiply A with each of P,, ..., P, and explain in each case what happens to A. 


2.1.7. Consider the 2 x 2 matrix A in Exercise 2.1.6 as two ordered points in a plane. 
Explain geometrically what is seen in (a) and (b) of Exercise 2.1.6. 


2.1.8. Let 


P cos@ -sinO 
~|sin@  cos0 |' 


Compute AA’ and A'A. 


2.1.9. Construct two examples each of two matrices A and B where (i) AB # BA, 
(ii) AB = BA. Exclude trivial cases involving identity, null and diagonal matrices. 


a-l; a 


Construct a matrix B such that AB = I, BA = I where I is a 2 x 2 identity matrix. 


2.1.10. Let 


2.1.11. Write the following systems of linear equations in matrix notation: 


(à  X-x;4x,-2 

2X1 + X) — 5X3 = 4; 

(b) 2x; 3X4 +X3 -X4 =1 
Xi +X +X3 +X4=7 

3X, + 2X) - X3 -X4 =5 


Xi — X2 -X3 +X4 — 4. 
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2.1.12. If the right sides in (a) and (b) in Exercise 2.1.11 are replaced by variables 
yp»... then write the transformation in the form Y = AX and identity Y, X,A in each 
case. 


2.113. Evaluate the Jacobian matrix in Exercise 2.1.12 (b). 


2.1.14. Write the following quadratic forms in the form u = X'AX where (i) A - A', 
(ii) A z A': 

(a) u2 2x3 xi —X3 +x} - 2x15 7 X9Xi, + 3X3X4; 

(b) us 2x +x} + 2x3 - 2X1X4 + XX2. 


2.1.15. If i. denotes the column vector of partial differential operators evaluate ou 


and 2 ou for (a) and (b) in Exercise 2.1.14. Write the final forms in matrix notations 


(one case each where the matrix of the quadratic form is (i) symmetric, (ii) not sym- 
metric). 


2.1.16. Let 
A= 1 1 1 
10 -1 


(a) Construct a 3x 1 non-null vector B such that AB = O. 
(b) Construct a 3 x 2 non-null matrix B such that AB - O, if possible. 


2.117. Let A and B be two matrices where AB is defined. Suppose that the second row 
of A is 5 times the first row. Then show that, whatever be B, the second row of AB is 
5 times the first row of AB. 


2.2 More properties of matrices 


A scalar function of the elements of a square matrix called the trace of the matrix is 
very useful in some practical applications. 


Definition 2.2.1 (The trace of a matrix). It is defined only for square matrices. Let A = 
(ai) benxn. Then the trace of A, denoted by tr(A), is the sum of the leading diagonal 
elements of A. That is, 


tr(A) = ay + An APs ae Ann: 


For example, 


O | = tr(A)=1+3 + (-5) =-1; 


82 — 2 Matrices 


A- |a j| > 09-2:3-5 
0 3 


0 O0 
4-5 o| > two. 


The following properties follow from the definition itself: 


tr(A’) =tr(A) (2.2.1) 
tr(AB) = tr(BA) (2.2.2) 


whenever AB and BA are defined. Note that, in general, AB need not be equal to BA 
but their traces are equal. Extending (2.2.2) we have 


tr(ABC) = tr(CAB) = tr(BCA) (2.2.3) 


even though ABC + CAB + BCA. 
Let A be a square matrix. For some matrices A we can construct another matrix B 


such that 
AB-I, BA-I 
where I is the identity matrix. 
Definition 2.2.2 (Regular inverse). If there exists a matrix, denoted by A ^1, such that 
AA =I, A'!A-I 
then 4^! is called the regular inverse, or simply the inverse, of A. 


For example, 


1 0 


1 
0 -3 -3 
aalt pesce T, AA! =I, A^A-I 
1 2 -1 1 
A= ee Adz- : BI AA!-I, A'A=I. 
-1 1 1 1 


Later we will discuss a systematic way of evaluating the inverse of a given matrix, 
whenever the inverse exists. The following properties are evident from the definition 
itself. 
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(i) IJAisannxn diagonal matrix with the diagonal elements d, + 0, d, #0,...,d, #0 


then A“! is ann x n diagonal matrix with the diagonal elements T AD i. 


(ii) A diagonal matrix A with at least one of the diagonal elements zero has no in- 
verse A !. 


(iii) A triangular matrix A (lower or upper) with atleast one of the diagonal elements 
zero has no inverse A^. 


(iv) For a given matrix A if A“! exists then it is unique. That is, if AB = I, BA =I as 
well as AC =1,CA=I then B = C = A”. 


(v) A null matrix has no inverse. 


(vi) If A and B are square matrices with A^! and P^! existing then 
(AB)! = BA", 


This result is easily proved by evaluating 
(AB(B!A) aswellas (BA ')(AB). 
Note that 
(ABB 14)- A(BB")A = AIA = AA! -I 
and similarly 
(BA (AB) = B!(A!A)B- BIB = B! B- I. 
This result can be extended to any number of n x n matrices having inverses. That is, 
(AA, +++ Ay) = Ad A51 A1. (2.2.4) 
We have already seen a similar result on transposes. That is, 
(A,A; =- Ay)! = Al ALAL. (2.2.5) 


An application of the regular inverse in solving systems of linear equations can be 
stated as follows: Consider a system of n linear equations in n unknowns, written in 
matrix notation as, AX = b. If the coefficient matrix A has a regular inverse then pre- 
multiplying both sides by A^! we get the unique solution of the system. 


AX-b with Aq existing means X- Alb. (2.2.6) 


Example 2.2.1. By using the illustrative example to Definition 2.2.2 solve the follow- 
ing systems of linear equations by inspection: 


Xx +X, = 2 (b) 2X, -X,= 1 
X, + 2x, =0. —X +X) =3. 


(a) 
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Solution 2.2.1. (a) Writing the system as AX - b we have 


^ al rs) tno] 


But from the illustrative example 


ENDE 
-1 1 
Hence 
E 2 -1||2 4 
An hs E ; | A = ij 3-4 and x,--2 


(b) Writing the system as AX - b we have 


From the above (a) itself 


rete tes 
"Np 2 
a= ; HEN — x24 and x,-7. 


Computing A^! first and then solving the system of linear equations by using the for- 
mula X = A !b is not the easiest way of solving the system even when A ! exists. We 
can see from the above examples that if A is an n x n matrix with n > 3 then by inspec- 
tion we may not be able to come up with A“! even when 4^! exists. Another simpler 
way of solving systems of linear equations by using a procedure called elementary 
transformations will be considered later on. 


A result on trace which will be useful in many applied problems is on the trace of 
a product of the type AA' where A need not be a square matrix. If A is m x n then A' is 
n x m and AA’ is m x m. The trace of an m x m matrix is defined. Similarly A’ is nx m 
which makes A'A an n x n matrix. Trace is again defined. By straight multiplication 
and then summing up the leading diagonal elements we can establish the following 
result: 


(vii) Let.A be any m x n matrix. Then 


tr(AA’) =tr(A’A) - Y Y aj = 


ij 


sum of squares of all elements in A. 
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The following results can be established easily. 
(viii) tr(AA  ) « tr(I) =n 


when A is an n x n matrix with A^! existing. 
(ix) If A is idempotent and if A + I then A! does not exist. 


(x) If A is a nilpotent matrix of order r, for some r, then A^! does not exist. (A null 
matrix is not counted among nilpotent matrices.) 


Example 2.2.2. Letthe elements of an mx n matrix X be all functionally independent 
(distinct) real variables Xj Let (s denote the multiple integral over all the variables 
xy's and dX the wedge product of all differentials in X. Then evaluate the integral 


y= | e HO dy, 
X 


Solution 2.2.2. Since 


(sum of squares of all elements in X), we have 
y= fo fe 959a, A Adin 


Since all the integrals over the individual variables are identical we need to evaluate 
only one integral. Let 


6= ie oda? i edx 
—oco 0 


A x2. d . e E 
(since e'* is an even function and since the integral exists) 


8- | yiledy (puty =x >x=y) > dx= sy tay) 
0 


Therefore 
y= (vam = 7nn/2. 


Note. The integral representation of a gamma function is the following: [A gamma 
function is defined in different ways. ] 


T(a) = [ii x*le*dx for R(a)>0 (2.2.7) 
0 
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where R(-) denotes the real part of (-). The notation T(a) (gamma alpha) is a stan- 
dard notation. It is a function notation, function of a, gamma of alpha. The integral 
in (2.2.7) exists even for complex parameter values of a provided the real part of this 
complex parameter a is positive. Then, of course, it exists for real a such that a > 0. 
For defining a gamma function this condition is not necessary. Only for representing a 
gamma function as an integral the condition R(a) > 0 is needed, otherwise T'(a) exists 
for all values of a + 0,-1, —2,.... Two immediate properties of a gamma function are 
the following: 


I(a)-(a-1)0(a-1) forR(a-1)»0 (2.2.8) 
and 


r(5) = JR. (2.2.9) 


Observe that (2.2.8) can be recursively applied to evaluate T' (a) when a is a positive 
integer. That is, 


T(n)=(n-1)! forn=1,2,.... (2.2.10) 
If the inverses of A and B exist and if A and B aren x n matrices then the following 
properties hold: 
(xi) (A (AR AA AS m=0,1,2,.... 


(xii) (aA"B)!-(B)'(A")!-(B3y(A?", mr=0,1,2,.... 


We have seen that the positive integer powers of a square matrix are defined. It is nat- 
ural to ask the question: is the square root of a square matrix A defined? Can we find 
a matrix B such that BB = B? = A? Then, naturally, we can say that B is a square root 
of A. Can we find such a square root for a given matrix A and when we can find one 
such B, is that B going to be unique? Let us examine this a little bit further. Consider 
the following matrices: 


Note that 
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The squares of A,,A,,A3,A,, all are equal to the 2x 2 identity matrix J. Hence A,, ...,A, 
all qualify to be a square root of I,. One of the simplest matrices that we can con- 
sider is an identity matrix. We see that when we can find a square root the square 
root is not unique. In general, there need not exist a matrix B such that B? =A for 
a given matrix A, and when such a B exists it need not be unique. Hence we will 
not deal with fractional powers of matrices in the following sections. The square root 
will be explored further after discussing something called the eigenvalues of a matrix 
later on. 


2.2.1 Some more practical situations 


A number of practical situations, where matrices come in naturally, are already dis- 
cussed. Some more will be listed here which involve only sums and products of ma- 
trices. The student is urged to take note of all the practical situations listed so far, and 
also the ones to be listed later, because we are going to enlarge on each of them later 
on. 


Example 2.2.3 (Center of gravity and moment of inertia). Some concepts connected 
with physics and statistics are the mean values and variances. Some of these will be 
examined here. Consider a set of numbers x;, ... , x, (such as the heights of students in 
a class, incomes of wage-earners in a city and so on). The average 


a 1 1 
Spot xc) (2.2.11) 
n n n 


where J’ = (1,1,...,1) and X' = (%4,...,xX,). Then 


5 


(xy? = xx = X(X)! 
(note that x’ = x since x is 1 x 1) 
v2 1 z $ $ 1 Irr! 1 $ 
(9? - (—) X'NU'X) = -x'J'x - -x'Bx (2.2.12) 


(a quadratic form in X) where 


1 1 1 
1 1 1 
B=- i ; 
n : 
1 1 1 
One interesting property of B is that 
B-p 


which means that B is idempotent. 
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If x,,...,X, are the values taken by a discrete real random variable x with the cor- 
responding probabilities p,,...,p,, Pi > 0, i=1,...,n, Ppi +--+ +P, = 1 then the mean 
value of the random variable x, denoted as p = E(x), is given by 


n 
=f xp;-X'P-P'X 
j=l 


n n 
= Dis MiPi MP i (since 2377 i 1) (2.2.13) 
i=1Pi i=l 
where X' = (x;,...,x,) and P’ = (p,,...,p,,). The expression in (2.2.13) is the mean value 
of the random variable x in statistical literature and it is the center of gravity of the 
system X when P is the vector of weights or forces and so on, in physics. The variance 

of the real discrete random variable x, denoted by o?, is defined as 


n 
o? = Y (4 -upi 
i-i 
which is also the moment of inertia of the system X and P in physics. When p, = -+ = 
Pn= : we have 
n 


e! - Yew? = L0 - By Cc p) 


i=1 


But note that 2 Y? , x? = +X'X and x? = +X’BX where B is defined in (2.2.12). Therefore 


1 1 1 
o? = -X'X - -X' BX = —X' [I - B]X (2.2.14) 
n n n 
where 
1 1 1 
Dog d Th 
AES aud ni 
I-B=| mn E n (2.2.15) 
E E 1 
Sa. Sa 1-5 
Observe that 


0-BY-2(-BYI-B)-I-B-B«B?-I-2B« B-I-B 
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since B - B? isidempotent. (The student may verify by directly computing (I — B)? also.) 
This means that I - B is also idempotent. Further, 


(I-B)B=B-B*=B-B=O 


since B is idempotent. The properties that B and I - B are idempotent and that 
(I - B)B - O are the fundamental results in the analysis of variance principle, re- 
gression analysis, design of experiments, independence of quadratic forms and in 
many other similar topics in statistics, econometrics and related areas. 


Definition 2.2.3 (Orthogonal matrices). If two non-null matrices A and B are such 
that AB - O then A and B are said to be orthogonal to each other. 


This is a generalization of the concept of orthogonal vectors in Chapter 1. For ex- 


ample, 
1 1 1 1 
(a) A-| | B=| 0 | => AB=O; 
1 -2 1 
-1 
1 3 
(b) A : B B=|0 0| = AB=O. 
1 -2 1 a 


Note that when AB = O every column vector in B is orthogonal to every row vector 
in A or the angle between these vectors is 7/2. As an immediate consequence of this 
definition we can observe the following result: 


(xiii) In a system of linear equations AX = O every solution vector X is orthogonal 
to the rows of A, where A is mx n and X isn x 1. 


Definition 2.2.4 (Orthonormal matrix). If a square matrix A is such that AA’ = I, 
A'A =I then A is called an orthonormal matrix or an element of the orthogonal group. 


For example, 
_|cos@ -sin0 
~|sin@  cos0 |’ 
AA! = core —sin@ cosg sin 0 |: 0 4I 
sinO  cos0 ||-sinO cosO O 1 


since cos? 0 + sin? 0 = 1. Hence A is an orthonormal matrix. Consider 


1 
B- 


x42 
L 
v2 


v2 
1 


1 
v2 


> BB' =I, B'B-I 
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which means B is orthonormal. Now, take 


da yb d. 

8 wW y 

1 2 1 1 ir 
Colm We) Wei ee eE 

E o ead. 

v2 v2 


which means C is orthonormal. 
The following properties are immediate: 


(xiv) If A is an n x n orthonormal matrix then (a) the length of each row vector is 
unity and the row vectors are orthogonal to each other, (b) the length of each col- 
umn vector is unity and the column vectors are orthogonal to each other. 


(xv) If A isn xn and if AA’ =I then A^! = A' and A'A - I. 


This is a very important result which has consequences in linear transformations, or- 
thogonal transformations, reductions of quadratic forms and so on. 


(xvi) An n x n orthonormal matrix A represents a rotation of the coordinate axes. 


This can be easily noticed geometrically as well as algebraically by looking at the 
transformation 


Y-AX, 


where 


cos@ -sinO 
sinô  cos0 |’ 


3) r- (A, AA'-I, A'A-I. 
X2 y2 


Ifthe coordinate axes are rotated through the angle 0 then the point ( 2 ) in the original 
axes of coordinates becomes (5 ) in the new axes of coordinates. 


X 


ll 


Definition 2.2.5 (A semi-orthonormal matrix). Ifan mxn,m < n rectangular matrix A 
is such that AA’ = I, then A is called a semi-orthonormal matrix or an element of the 
Stiefel manifold, and if m =n then it is a full orthonormal matrix. 


For example, 


A= (cos0,-sin0) = AA’ =I, 21 = A issemi-orthonormal; 


a 
an 


1 9 -4 


1 
sa|? Bu NB 
2 v2 


| = BP' =I, = B issemi-orthonormal. 
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Observe that when A is m x n such that AA’ = I„ then each row has length 1 and the 
rows are orthogonal whereas the columns do not have these properties when m « n 
which means that when m < n, AA’ = Im does not imply that A'A = I, which can be 
verified from the above illustrative examples. 


Example 2.2.4 (The covariance matrix). The covariance matrix or the variance-co- 
variance matrix in statistical theory, denoted by V, is defined as follows: 


V =E[(X - p(X - p)' ] = E(XX") - pp! 


where E denotes the expected value, u = E(X) and X is a p x 1 vector of real random 
variables. This matrix represents the configuration of variances and covariances in the 
vector random variable X. For example, let the joint density of x, and x, X’ = (x1, x5), 
be 


f(X')2fGx)2x;*x, OSX, <1,0<x <1 


and f(X') = 0 elsewhere. Then 


x E(x) H 
X=| 1], BO ep pee 
(a) rm (5) (A) 


nop aXe. ae) | BR). Eqs). 
i es x$ |- Po E(x3) | 


1 g1 
E(x,) = | | X106 + x;)dx, ^ dx; 


E 1 7 
= [ x(x + ;)es EET = E(x) 


due to symmetry, and 


11 
E(x,X>) = I Í X906 + X2)dx ^ dx; 


4| 


1 1 5 
EG) - |. [ es «xot ^ = 7, = EG) 


1 
| [XX Halan ax, = H 
0 3 


due to symmetry. Then the covariance matrix 


V= | E(x) 2 -| Hn 2| 
E(x) E(x3) nh Im; 
E J-l (5) d 

1 

io) (GG GS 
4 1 

_| 144 144 

-| 1 11 | 


(1444 144 


ll 
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Example 2.2.5 (Maxima/minima). When a scalar function 


f =f (X,...5Xp) 


of many real variables x,,...,x, is considered the critical points are available by solv- 
ð 
Oxy 
ing X - O where 3. = ( P 


Ds 


) asseen in Chapter 1. One can check for maxima/minima 


at these critical points by evaluating the matrix 3r we at these critical points. For ex- 
ample, let 
f =X? + 2x5 + 2x3X; - X - 2x; + 8. 
Then 
a of 
a (3) of ($) p 
av D of = 
X \ ax AE 2x; +4% -2 
Hence 
of | 2««26-1] [0] , 
OX — 2x,+4x,-2] [0 


o? o 
d of ax? UT OX,OXy, 
OX ƏX' 2 7 aV 
OxQOX, 77 axi 


which in our example is, with n = 2, operating on f. 


dof ə 
FE = Fa + ey 12x lx - 2) 


Er 
EN 
Since this matrix here is free of x, and x, this matrix evaluated at the critical point 


is itself. This matrix is positive definite and hence the critical point corresponds to a 
minimum. [Definiteness of matrices will be considered later on.] 
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Example 2.2.6 (Transition probability matrix). A worker finds that his boss has three 
stages of her mood, “pleasant”, “tolerable”, “intolerable”. If the boss is in a given 
mood in the morning it can change to one of the three stages by the evening. That is, 
for example, “pleasant” can change into “pleasant” or “tolerable” or “intolerable”. Let 
pj be the chance (probability) that the i-th stage of the mood in the morning changes 
to the j-th stage of the mood by the evening. Then the 3 x 3 matrix (pj) is a transition 
probability matrix. For example, suppose 


05 0.4 04 
01 02 07 


which in terms of the various stages of the mood is the following: 


evening 


pleasant tolerable intolerable 


pleasant 0.5 0.4 0.1 
morning tolerable 0.3 0.6 0.1 
intolerable 0.1 0.2 0.7 


For example the chance of going from “tolerable” to “pleasant” is 0.3 or 30%, whereas 
going from “tolerable” to “intolerable” is 0.1 or 10%. The chance of going from “pleas- 
ant” to “pleasant” is 0.5 and that from “intolerable” to “intolerable” is 0.7. In general, 
if there are k stages and if p; is the probability of going from stage i to stage j then the 
transition probability matrix is 


Pu Pu >» Prk 
P=(p;)= s I. e Ex 
Pa Po c Pk 


where the sum of each row is unity, Y Dy =1 for each i = 1,2,...,k and each entry 
pij 2 0. If the probability of transition from the j-th stage to the i-th stage is denoted by 
pij then we have the transpose of the P above. In this case the sum of the elements in 
each column will be 1. Such a matrix P where, either the elements in each row sum to 1 
or the elements in each column sum to 1, but not both, is also called a singly stochastic 
matrix. If both, the elements in each row and each column sum to 1, that is, x pj71 
for each i as well as Xs pij = 1 for eachj then such a matrix is called a doubly stochastic 
matrix. 


For our example of the boss let us examine one interesting aspect. Suppose that 
there is no mood change in the night. She will be in the same mood in the morning as 
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the one ofthe previous evening. If she is in a “pleasant” mood in the first day morning 
what is the chance that she will be in a *pleasant" mood in the second day evening? 
Let the three stages “pleasant”, “tolerable”, and “intolerable” be denoted by 1,2,3 re- 
spectively. If she is in stage 1 in the first day morning it can go from 1 to 1 or 1 to 2 or 1 
to 3 by the evening. Then on the second day it can go from 1 to 1, given that she was 
in stage 1 by the previous evening, or 2 to 1, given that she was in stage 2 by the previ- 
ous evening, or 3 to 1, given that she was in stage 3 by the previous evening. Thus the 
chance of finding her in stage 1 on the second evening is given by 


(0.5)(0.5) + (0.4)(0.3) + (0.1)(0.1) = 0.38. 
From stage 1 in the first morning to stage 2 in the second evening has the probability 
(0.5)(0.4) + (0.4)(0.6) + (0.1)(0.2) = 0.46, 


and so on. [Probability of the intersection of two events = conditional probability mul- 
tiplied by the marginal probability.] The transition probability matrix for the second 
evening starting with P in the first day morning is then 


0.38 0.46 0.16 
PP-P?-|0.34 0.50 0.16 
0.18 0.30 0.52 


The transition probability matrix from the first day morning to the k-th day evening is 
then P*, There are several interesting aspects that can be studied by using transition 
probability matrices. Some of these will be considered later on in the coming chap- 
ters. 


Definition 2.2.6. If A and B are two non-null matrices such that AB = BA then A and 
Bare said to commute or are said to be commutative. 


Note that in general, whenever AB and BA are defined AB + BA. But in some cases 
AB = BA. For example, 


IA=AI > 
that any n x n matrix A and then x n identity matrix are commutative. Let D, and D, be 
two diagonal matrices of the same order (which includes identity and scalar matrices) 


then 


D,D, = D,D}. 


(xvii) Diagonal matrices of the same order are commutative. 
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If A is an arbitrary matrix and D a diagonal matrix (not equal to a scalar matrix) then 
AD + DA. Let 


1 1 1 
1 1: od 

ÁA-|. . .p Be(bgz-P' 
1 1 1 


then AB - BA (or A and B commute). 


2.2.2 Pre and post multiplications by diagonal matrices 


Let A = (aj) be an nxn matrix and D = diag(d,,...,d,) be a diagonal matrix. The effects 
of pre and post multiplications of the matrix A by the diagonal matrix D are something 
very interesting. The student must memorize these results because in many structural 
decompositions of matrices these results will come in handy: 


d O .. O)fay dp .. dg 

DA- O d .. Of] ay an .. dy 

O O0 .. dard ban am .. Ann 
dyay dyap .. Ay Ayn 
[dan dan ... dam 
AnAn daaa e AnAnn 


(xviii) Premultiplication of a matrix A by a diagonal matrix D is equivalent to mul- 
tiplying each row of A by the corresponding diagonal elements in D. 


That is, the first row of A is multiplied by the first diagonal element in D, the second 
row of A is multiplied by the second diagonal element in D and so on. Now, let us see 
what happens if we postmultiply A with D: 


a, dp .. an|lfd O .. O 
a a isy “d O d, .. O 
AD-|? P EIE 
An do e AmilLO O0 .. d, 
diay, dap .. da, 
= dan dan ... dpan 
diam dram ss nnn 
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(xix) Postmultiplication of A by the diagonal matrix D is equivalent to multiplying 
each column of A by the corresponding diagonal elements in D. 


That is, the first column of A is multiplied by the first diagonal element in D, the sec- 
ond column of A is multiplied by the second diagonal element in D and so on. For 


2 oļfi 1] f2 2] 

O 3}[2 4| 16 12] 

1 ok] p. 0] [2 3 

2 ajlo 3] |4 Rf 
It is worth observing that the same properties hold if an m x n matrix A is premul- 
tiplied by an m x m diagonal matrix and postmultiplied by an n x n diagonal ma- 


example, 


trix. 


Example 2.2.7 (Covariance and correlation matrices). In statistical analysis the co- 
variance between two real scalar random variables x; and x;, denoted by o;, has the 
following representation: 


OF = pijOj0j. 0; #0, 0j i0 


where p; (Greek letter rho) is the correlation between x; and x;, o; and o; are the stan- 
dard deviations (measures of scatter) in x; and x; respectively (0? is the variance of x;). 
Then the covariance matrix or the variance-covariance matrix can be written in the 
following structural form: 


Puoi PRO, .. inn, 
V=(0;)= ; : A = DPD 
Pm9n9% Pn29n92 Ses PnnOs 
where 
oao 0 .. 0 Pu Pop > Pin 
D- 0 2 and P= Pa pn et Pm 
0 O0 .. On Dm Pn ++ Pm 


where V is the covariance matrix, D is a diagonal matrix of standard deviations and 
P is the matrix of correlations. (Incidently pj; = 1, i= 1,...,n which follows from the 
definition itself.) 
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Exercises 2.2 


2.2.1. Compute the traces of the following matrices: 


0 -1 2 -1 0 #1 2 0 O 
(a) A=|2 3 1], B=|0 2 1], C=/]1 1 24; 
0 0 7 0.0 -2 O 1 -1 


(b) AB, BA; (c) ABC, CAB, BCA; (d) 2A - 5B, 2tr(A) - 5 tr(B). 


2.2.2. If A = (aj) is a 3 x 3 matrix obtain (a) tr(A2), (b) tr(AA') and compare the results 
in (a) and (b). 


2.2.3. If Xisapxp matrix which can be written as X = TT’ where T is a lower triangu- 
lar matrix, (a) compute the trace of X in terms of the elements of T; (b) can you repre- 
sent every element in T as a function of the elements in X, if so, is the representation 
unique? (c) What are the conditions on the elements of T so that the transformation 
X =TT' (itis a nonlinear transformation) is one-to-one, that is, every element in T can 
be uniquely written as a function of the elements in X and vice versa? 


2.2.4. By inspection, write down the regular inverses, if they exist, for the following 
matrices: 


2 0 00 
21 
"m "' xt: 
3 5 E 3» WD 
5 7 2 8 
0 00 
ae eee 
= , D- 
Clo 0 o0 pe 
0.009 


2.2.5. Show that if A is lower (upper) triangular then its regular inverse, when it ex- 
ists, is also lower (upper) triangular. Verify the result for a general 3 x 3 lower (upper) 
triangular matrix. 


2.2.6. Taking the matrices in Exercise 2.2.1 (a) verify the following results: 
(a) (AB)'-B'A', (b) (ABC) - C'B'A'. 


2.2.7. Let 


us P des NES NECS E SE, 
n j^ 2 -1 


where A is n x n. Compute A?, A?, A, Al, B?, B’, BI, BM! (2. 63. (30. C43. What 
are A*, BK, C* for a general k? 
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2.2.8. (AB)? + A?B? in general. But for some special situations (AB)? = AB. Construct 
non-null, non-diagonal 3 x 3 matrices A and B such that (a) (AB)? + A?E?, (b) (AB)? = 
AB, 

2.2.9. Is (A + BY = A? + 2AB + P? for general n x n matrices, n > 1? If not, what is the 
correct expansion formula? Derive the expansion formula for (A + B}. 


2.2.10. The product AB = C = (cj) is defined in such a way that the (i,j)-th element in 
C, namely cj, is the dot product of the i-th row of A with the j-th column of B. Now, let 
05, ..., 0, be the columns of A and £4, ..., B, be the rows of B. Then obviously aj; is an 
n x n matrix. Show that the product AB can also be written as a sum of such matrices 
in the following form: 


AB = af, + By +--+ + aus. 


2.2.11. Construct different 3 x 3 matrices A, B, C, other than the ones in the text, such 
that (a) AA’ =I, (b) Bis semiorthonormal, (c) BC = O. 


2.2.12. Let the illustrative orthonormal matrix in the text involving 0 be denoted by 


cos@ -sin0 
n= Be cos 0 | 


Let P(0;) be the same P(0) with 0 replaced by 0;. Then P(0,) and P(05) represent rota- 
tions of the coordinate axes through angles 0, and 0, respectively and P(0,)P(0,) rep- 
resents the situation of first rotating through an angle 0, and then rotating through an 
angle 0}. Show that 

(a) P(0))P(0;) = P(0, + 02). 

(b) What is the geometrical interpretation of P(—0)? 


2.2.13. Construct different 2 x 2 non-null matrices A, B, C with real elements such that 
(a) A?--I, (b BC--CB. 


2.2.14. Let A bea given nxn matrix and X an arbitrary nx n matrix such that AX = XA 
for all X. Then show that A is a scalar multiple of an identity matrix (scalar matrix). 


2.2.15. Showthat in Example 2.2.7 the inverse of the covariance matrix, V-!, whenever 
it exists, can be computed by the formula 


1 1 
z O 0 z 0 vus 0 
Vesa ee um. e EPH now. 8 
1 1 
0 O0 7 E E 


2.2.16. (a) If A is an m xn matrix and if A = —A then show that A is a null matrix. 
(b) If c is a scalar (1 x 1 matrix) then the scalar multiplication cA is defined. Show 
that the matrix multiplication cA is defined only if m = 1. 
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2.247. If Aisannx n matrix and if A commutes with every n x n matrix then show 
that A is an identity matrix. 


2.2.18. Prove that A is symmetric iff A' is symmetric and vice versa. 


2.2.19. Give two examples of different symmetric matrices A and B such that (1) AB is 
symmetric; (2) AB is not symmetric. 


2.2.20. If A is symmetric, is (1) AX symmetric, where k is a positive integer? (2) Is A"! 
symmetric if A is nonsingular also? (3) For any matrix A if A? is symmetric is A sym- 
metric? Justify your answer by examples (2 each) and counter examples (2 each). 


2.2.21. If Ais a skew symmetric matrix then show that B = (I + A)(I - A)! is orthonor- 
mal, that is, BB’ = I,B'B- I, and that B^! = B'. 


2.2.22. For any matrix A show that tr(B^1 AB) = tr(A) where B^! exists and the product 
is defined. 


2.2.23. If A = (aj(t)) where each element of A is a function of the real variable f, then 
show that 


2.2.24. Right and left inverses. Let A be any m x n matrix. Any matrix B for which 
BA =1 is called a left-inverse of A and any matrix C for which AC - I is called a right- 
inverse of A. Any matrix X such that AX = I, XA - I is called the regular inverse of A. 
Compute a left inverse B, and all possible right inverses C for the matrix 


1 1 -1 
A= : 
11 1 


2.2.25. Show that, in general, if L is a left inverse of any matrix A then L’ is a right 
inverse of A' and that if R is a right inverse of A then R' is a left inverse of A'. 


2.2.26. Let A be m xn and B ben xr with a,,...,0,, being the rows of A and f, ... f, 
being the columns of B. Then show that 


a,B 
AB= : = (Af,, .... AB,). 
A,B 


2.2.27. For any square matrix A show that B= A + A’ is symmetric, C = A - A' is skew 
symmetric, A = 5B + iC = sum ofa symmetric and a skew symmetric matrices and that 
this representation is unique. 


100 —— 2 Matrices 


2.3 Elementary matrices and elementary operations 


Elementary matrices and elementary operations (multiplications by elementary ma- 
trices) have wide spread applications in solving systems of linear equations, in deter- 
mining linear independence and dependence of vectors, in determining the rank of a 
matrix, in obtaining a basis for a vector space, in evaluating the inverse of a matrix, 
and so on. Here we will define the basic elementary matrices and then will look into 
various types of operations with elementary matrices. 


Definition 2.3.1 (The basic elementary matrices). The two basic elementary matrices 
are the following: Consider an nx n identity matrix I. If any row (column) of I, is mul- 
tiplied by a nonzero scalar then the resulting matrix is called an elementary matrix. 
This is one basic type of an elementary matrix. We will call this an E type elementary 
matrix. If any row (column) of I, is added to any other row (column) of I, then the 
resulting matrix is the second basic type of an elementary matrix. We will call this an 
F type elementary matrix. 


The E and F types of elementary matrices are the basic types of elementary matri- 
ces. For example, consider a 3 x 3 identity matrix I = I}: 


5 0 0 
E = 0 1 0 
O 0 1 


is an elementary matrix (the first row of I; is multiplied by 5); 


1 0 0 
E,=|0 -2 0 
00 1 


is an elementary matrix (the second row of I; is multiplied by —2); 


1 0 O 
E; = 0 1 0 
0.0 x 


is an elementary matrix (the third row of I; is multiplied by x for x + 0); 


100 
F- 1 1 0 
O 0 1 


is an elementary matrix (the first row of I; is added to the second row); 


10 0 
F,=|0 1 0 


EN 
o 
EN 
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is an elementary matrix (the first row of I; is added to the third row); 


1 
F,-|o 
0 


oro 


1 
0 
1 
is an elementary matrix (the third row of I; is added to the first row); 
0 0 
1 0 
1 1 


is an elementary matrix (the second row of I; is added to the third row). 

The row which is added remains the same. The net effect is on the row to which 
another row is added. Before we start operating with these basic elementary matrices 
let us look at the inverses of these. What is that matrix which nullifies the effect on 
I, or I, is regained by premultiplication of a given elementary matrix with the new 
matrix? For example, consider F,. What is Ej! such that E; !F, = I? Multiplication of 
acertain row of an identity matrix by a nonzero scalar can be nullified by dividing the 
same row by that nonzero scalar. Therefore 


100 

Ej-|O0 1 o| sothat E/E, =I, - EE 
0 0 1 
1 0 0 

Ej-|0 -j 0| = EE, =h =B; 
0 0 1 
1 0 0 

Ej- 1 0| = EE =Á =E, x£0. 
001 


Thus the inverses for the E series of elementary matrices are easily obtained. Similar 
is the situation whatever be the order n in I„. Now, look at the F series of elementary 
matrices. How can the effect in F, be nullified so that F, multiplied by a matrix, de- 
noted by F;!, gives back I? F; is obtained by adding the first row to the second row. 
Naturally the effect can be nullified by adding (-1) times the first row to the second 
row. That is, 


sothat Fy'F, =I, =F,F;'; 


= O O 


o 


> Fy'Fy = FFs" = b; 


EN 
oro OF OO 
o 


o 
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1 0 -1 

Fi-|o1 0| = F5'F, =I, =F,F;'; 
09^ 1 
1 0 0 

F;'=|0 1 0) = FF, -L-EF, 
0 -1 1 


From the way we defined the basic elementary matrices it is evident that the regular 
inverses exist for all elementary matrices. 


(i) Regular inverses exist for all basic elementary matrices or elementary matrices, 
thereby the products of elementary matrices, are nonsingular. 


Definition 2.3.2. For a given square matrix A if there exists a matrix B such that 
AB = I, BA =I then the regular inverse exists and it is denoted by B = A ! and in this 
case A is called a nonsingular matrix and square matrices for which regular inverses 


do not exist are called singular matrices. 
(ii) A square null matrix is a singular matrix. 


(iii) A diagonal matrix with all nonzero diagonal elements is nonsingular and 


if there is at least one zero diagonal element then the diagonal matrix is singu- 
lar. 


(iv) A triangular matrix (lower or upper) with all nonzero diagonal elements is 
nonsingular and it is singular if there is at least one zero diagonal element. 


2.3.1 Premultiplication of a matrix by elementary matrices 


Consider an arbitrary 3 x 3 matrix A = (aj) and consider the £; of the previous section. 
Then 


Note that E; is created by multiplying the first row of an identity matrix by 5. When we 


premultiply any 3 x 3 matrix A by E, the effect is exactly the same, that is, the first row 
of Ais multiplied by 5. 
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(v) If E is an elementary matrix created from I,, by multiplying the i-th row by a 
nonzero scalar c and if any n x n matrix A = (aj) is premultiplied by E, that is EA, 
the net effect is that the i-th row of A is multiplied by c. 


Note that E^! in this case is a diagonal matrix with the i-th diagonal element i and 
all other elements unities. If we again premultiply EA with this matrix E^! then we get 
back A. That is, 


E^ (EA) =A. 


Now, consider premultiplication of a 3 x 3 matrix A = (aj) by the F, of the previous 
section. That is, 


1 O 0j|[|ag ap ay 
FA=|1 1 Of]ay, ay a; 
O O 1 Ax, Az 33 


=] 4y, tây Antan 473+ ag 


a3, 432 033 


Observe that F, is created by adding the first row to the second row of an identity 
matrix. (First row remains the same, the second row becomes the original second row 
plus the original first row.) The net effect of premultiplication of A by F, is exactly 
the same, the first row of A is added to the second row of A. In general, we have the 
following result: 


(vi) If F is an elementary matrix created by adding the i-th row to the j-th row in I, 
and if an arbitrary n x n matrix A is premultiplied by F, that is FA, the net effect is 
the same, that is, the i-th row of A is added to the j-th row of A. 


Example 2.3.1. What are the net effects of the following operations? (a) E,F,E,A, 
(b) E,F;E3E;F,E,A where 


2 0 0 100 

E=|0 1 xs sete Ole Bek 
0 0 1 001 
3 0 0 1 0 0 

E=|0 1 O|, RK=|O0 1 O|, E,-E;. 
0 0 1 1 0 1 
1 -1 

A-|2 -1 
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Solution 2.3.1. 


-2 0 oļfi 1 1] f-2 -2 2 
EA=|o 1 O||2 0 aļ=|2 o -1], 
0 0 1]|3 1 4 3 1 4 
F\E,A =F,(E,A) 
1 0o oļf-2 -2 2 -2 -2 2 
e. pog. o aļ=ļo0 -2 dq. 
o o abs 1 4 3 1 4 


E,F,E,A = E,(F,E,A) 


-4 0 oļf-2 -2 2] [11 a 
=|o 10||0 -2 1ıļ=|0 2 1 
o o LL qd] [3 1 4 


The net effect of the operations so far is that the (2,1)-th element in A is made zero. 


E4E;F,E,A = E4(E;jF,E,A) 


-3 0 Ol tay 1 a] [3 = 3 
-[o 1 ollo 2 1]}=]0 -2 1l, 

o 0 1/|3 1 4 3 1 4 

1 0 oļf-3 -3 3] [-3 -3 3 

F(EjFEA)-|o 1 O}]| 0 -2 ıļ=lo0 -2 1 

1 0 Hall use 1 4 o -27 


The net effect of the operations so far is that the first column elements, except the first 
one, are reduced to zeros. Then operation on the left with E, = E! gives 


1 1 1 
E,(F)E3E)F,E,A)=|0 -2 1 
0 -2 7 


Example 2.3.2. Reduce the matrix A in Example 2.3.1 to a triangular form by premul- 
tiplication with elementary matrices (by elementary operations). 


Solution 2.3.2. Part of the work is already done in Example 2.3.1. Now we continue. 
Consider the elementary matrices 


1 0 0 100 
E-|0 -1 O|, F;=|0 1 0 
0 0 1 0 1 1 
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Then E; operating on the reduced form from Example 2.3.1 gives 


1 1 -1 1 0O O}}1 1 -1 
E;|O -2 1/=]0 -1 0||O -2 1 
0 -2 7 Oo 0 1j[0 -2 7 

1 1 -1 

=|0 -1 

0 -2 7 


Now, F; operating on the above form gives 


1 1 -1 1 0||1 -1 
F3}0 2 -1}/=/0 1 0|jO 2 -1 
0 -2 7 O 1 1j[0 -2 7 

1 1 -1 

=|0 2 -1 

0 0 6 


This is an upper triangular matrix. Hence the solution is complete. 


Note that if A is to be recovered from the final form then write the final equation as, 


11 1 
F3EsE,F)E3E,F,E,A=|0 2 -1 
0 0 6 


Premultiply both sides by the inverses F3 LE 5 1 and so on, in that order. Then 


1 1 1 
ASESFOESESRQESESPBS |0 2 -1 
0 0 6 
But 
1 0 0 1 0 0 
Fj;s|o 1 o], E!'s|Oo -1 oJ, 
o -1 1 0 0 1 
-300 1 0 0 
Ej-|0- 1 0), F'-|0 1 oj, 
0 0 1 -1 0 1 
-4 0 0 -2 0 0 
E'=|o0 1 0j, EĘ'=|O 1 oj, 
0 0 1 0 0 1 
1 0 0 -} 0 0 
F =|-1 1 oj, BAS | 0. d ol. 
0 0 1 0 0 1 
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Multiplications can be carried out by inspection, remembering that if any matrix is 
premultiplied by an elementary matrix the same effect is there on that matrix. For 
example, E;!F;! will have the effect that the second row of F3! is multiplied by (-1). 
That is, 


1 0 0 
E;\F;'=|0 -1 0 
0 -1 1 


Premultiplying this by E;! will have the effect that the first row is multiplied by (-3). 
That is, 


3 0 0 
EJAEjFj-|0 -10 
0 -1 1 


Premultiplying this by F;! has the effect that (1) times the first row is added to the 
third row, and so on: 


FjIE,E,F- 


0 

3 1 

1 0 0 
Ej (F;!E,E;Fj)-2|O0 -1 0}; 

3 


- 1 

-2 0 0 
Ej(EjF,;EJE,;Fj)-|O -1 0 
3 -1 1 


Premultiplying this with F;! has the effect that (—1) times the first row is added to the 
second row. That is, 


2 0 0 
F (E3'E3'F;'E3'E;'F;')=| 2 -1 0 
3 -1 1 
Then 
0 0 
Ey (Fy es Ey Fs EBS Fs) =|.2 -1 0 
3 -1 1 
Thus 
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It can be verified by straight multiplication of the two matrices on the right and then 
comparing it with the given matrix in the example. The whole process is done in de- 
tail in Examples 2.3.1 and 2.3.2 so that the student can clearly understand the effects 
of premultiplications by elementary matrices, how to write down the inverses of ele- 
mentary matrices or products of elementary matrices without doing any computation 
etc so that at later stages many such operations can be carried out simultaneously in- 
stead of doing it one at a time. The final solution is that A is written as a product of a 
lower triangular matrix, say L, and an upper triangular matrix, say U. That is, 


A z LU. (2.3.1) 


Is the reduction of any given n x n matrix A to a product of n x n lower and upper 
triangular matrices possible? We can answer this question by making a series of ob- 
servations. 


(vii) Elementary matrices of the E category are always diagonal. 


(viii) Elementary matrices of the F category are always lower triangular if the ele- 
mentary matrices are created by adding the i-th row to the j-th row of an identity 
matrix, with i <j. [If i > j the elementary matrix is no longer lower triangular.] 


(ix) Product of a lower triangular matrix with a lower triangular matrix or with a 
diagonal matrix remains lower triangular. 


(x) Regular inverse of a lower triangular matrix is lower triangular (write it as a 
product of elementary matrices and prove the result) and that of a diagonal matrix 
is diagonal. 


Therefore in attempting to reduce a given matrix A to the form LU if only the steps 
in (vii) to (x) are involved we have a possibility of obtaining the form LU. If during 
the process, at any stage, the lower triangular nature of an elementary matrix of the 
F category is violated then we cannot expect the form LU unless the effect of that ele- 
mentary operation is nullified by another elementary matrix during the process. Still 
we may not be able to get the form LU. Note from the examples that we could reduce 
the elements below the leading diagonal to zeros because we had a nonzero diago- 
nal element sitting there at that stage of the operations. If the first row first column 
element in A was zero then by adding suitable multiples of the first row to the other 
rows we could not reduce nonzero elements in the same column to zeros. We can bring 
in a nonzero element to the (1, 1)-th position by using an elementary matrix of the F 
category provided there is at least one nonzero element in the first column. But this F 
will not be lower triangular since an i-th row will be added to the j-th row with i > j. 
After reducing all elements in the first column to zeros except the first element, we 
can use the second diagonal element, (2, 2)-th element, in the resulting matrix. If this 
resulting (2, 2)-th element is zero the same situations as described above will arise. 
That is, at every stage, the leading diagonal elements must be nonzero in the sense 
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that when we come to reducing the elements below the diagonal on the j-th column 
to zeros the (j,j)-th element must be nonzero for j = 1,...,n — 1. Then we end up with 
the form A - LU where L is lower triangular as well as nonsingular, being product of 
elementary matrices, and U may or may not be nonsingular. If A itself is nonsingular 
then both L and U will be nonsingular and in this case, 


A=LU > A'=U"'L! 


where U^! and L^! are easier to evaluate, being triangular, compared to the evaluation 
of the inverse of a general n x n matrix. 


Example 2.3.3. Reduce the following matrix A to a triangular form by elementary 
operations on the left of A, where 


O0 1 -1 
A=|2 -1 2 
1 3 -2 


Solution 2.3.3. Since the (1,1)-th position has a zero element we add the third row to 
the first row to bring in a nonzero entry at the (1,1)-th position. This can be achieved 
by premultiplying with 


That is, 


FjA-|O 1 0||2 -1 2ļ=ļ|2 -1 2 
0.0 1j|1 3 -2 1 3 -2 


Note that the net effect is on the first row and not on the third row. Instead of the third 
row we could have added the second row to the first row which would have brought in 
a 2 at the (1, 1)-th position. But 1 is easier to handle than 2. Instead of operating with 
elementary matrices of the E and F categories we may do two or more such opera- 
tions together and write the resulting products of elementary matrices as G category 
matrices. Let 


1 0 0 
G,=|-2 1 0 
001 


which is evidently a product of the basic elementary matrices. The net effect of pre- 
multiplying with G, is that (—2) times the first row is added to the second row. Later 
on, we will make a statement of such a premultiplication as follows: 


a(i) + Gj) = (2.3.2) 
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which means that a times the i-th row is added to the j-th row, where the i-th row 
remains the same whereas the j-th row changes to the original j-th row plus a times 
the original i-th row. In terms of the notation in (2.3.2) we can write G, as equivalent 
to the operation 


1 4 -3 
-21*«02)2G,FA-|O -9 8 
1 3 -2 
Let 
1 00 
G=|0 1 0 or -1(1-4(3)2. 
-1 0 1 
Then 
1 4 -3 
G,G,F,A=|0 -9 8 
O -1 1 


Since F; is no longer a lower triangular form we do not expect A to be written as LU, 
product of lower and upper triangular matrices. Now to make (3,2)-th element zero we 
have two choices, either divide the second row by -9, which produces a fraction at the 
(2, 3)-th position, then operate with the resulting second row, or interchange the third 
and second rows, which is a product of elementary operations, and then operate with 
the new second row. This last procedure avoids fractions. Hence consider 


and then 
100 1 4 -3 
G3(G,G,F;A)=|}0 O 1||O -9 8 
O 1 O0 O -1 1 
1 4 -3 
=|0 -1 1 
0 -9 8 
Let 
1 0 O 
G,=|0 1 O or -9(2«(3o 
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(-9 times the second row is added to the third row). Then 


1 4 3 
G,G;6,6,F,A=|0 -1 1 
O0 0 -1 


This is the desired triangular form. Since G4, G5, G4, G, are products of the basic ele- 
mentary matrices they are nonsingular and their regular inverses exist. Again, these 
inverses can be written down by inspection. In fact, 


1 0 -1 100 
F's|o 1 oO], G!2s[|2 1 of, 
0 0 1 0 0 1 
100 100 
Gyi=|0 1 0|, G=]0 0 1], 
10 1 O 1 O0 
100 
G;i=|0 1 O 
0 9 1 
From the above representation 
1 4 -3 
A=F,'G,'G;'G;'G;'|0 -1 1 
0 0 -1 


But G3! operating on G;! will make the second and third rows interchanged in the 
original G;'. That is, 


100 
G3G3!-|0 9 1|; 
0 1 0 
1 0 0][10 o] [10 0 
G;3Gj3Gj-|0 1 O0|]O 9 1]=]0 9 O 
10 1j[0 1 0| |1 1 O 
1 0-0 
G,'G7'G3'G,'=|2 9 1/5 
11 0 
o -1 0 
Fi'GI'G3'G3'Gz3=|2 9 1 
1 1 0 
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Thus A has the decomposition 


A-|2 9 1/||O0 -1 1 
1 1 O0[|O O -1 


where the first matrix on the right is not lower triangular but nonsingular (being prod- 
uct of elementary matrices) but the second matrix on the right is upper triangular. 
Thus, in general, we can have a decomposition of an n x n matrix A to the form 


A- BU (2.3.3) 
where B is nonsingular and U is upper triangular. This U will be nonsingular if A is 


nonsingular. In our example, U is nonsingular since none of the diagonal elements in 
U is zero. 


(xi) Interchange of two rows (columns) is a product of elementary operations. 


2.3.2 Reduction of a square matrix into a diagonal form 


Here we consider the reduction of a square matrix into a diagonal form by premulti- 
plication with elementary matrices, that is, by premultiplication alone. Later we will 
consider the reduction to a diagonal form by postmultiplication alone, and then re- 
duction to a diagonal form by pre and post multiplications. 


Example 2.3.4. Reduce the following matrix A to a diagonal form by premultiplica- 
tion alone, where 


0 1 -2 
Az|1 2 5 
3 -1 0 


Solution 2.3.4. Let 
1 1 0 
F,=|0 1 Of, 
O O 1 
that is, add the second row to the first row ((2) + (1) 2). 


11 0 1 
FA-|0 1 0||1 2 5ļ=|1 2 5 
0 0 1 3 
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Let 
1 00 1 0 0 
G,=|-1 1 0|, G,=|0 1 O| or 
O O0 1 -3 0 1 
G: -10 02025 G: -300* (322. 
1 3 3 
G,G,F,A=|}0 -1 2 
O0 -10 -9 
Let 
1 33 0 
G,-|0 1 0 or 3(2-«(1) and 
O O 1 
1 (0) (0) 
G,=|0 1 O or -10(2)+(3)>. 
O -10 1 
Then 
1 0 9 
G,G3;G,G,F,A=|}0 -1 2 
0 0 -29 
Let 
1 0 0 1 
E,=|0 1 0 or -—(3)o 
1 29 
0 0 -35 
or the third row is divided by (-29). Then 
1 0 9 
E,G,G63G,G,F,A=|0 -1 2|. 
O O 1 
Let 
1 0 0 
Gs=|0 1 -2 or -2(3)+ (2) >; 
0 O 1 
1 0 -9 
Ge=|0 1 0 or -9(3)+()=>. 
O O 1 
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Then 
1 0 O0 
G&;Gs;E4G,G4G,G,Fj,A- |O -1 0 
O O 1 


That is, BA is a diagonal matrix where 
B=G,G,E,G,G636,G,F, 


is anonsingular matrix, being products of elementary matrices. Therefore, in this rep- 
resentation 


A = B’ multiplied by a diagonal matrix, 
Bo =F7'G;'Gs'G;'G; EF; GG... 


Note that if A is nonsingular we can expect the diagonal matrix to have all nonzero 
diagonal elements. If A is singular then at least one diagonal element in the diagonal 
matrix will be zero. Thus for any n x n matrix A we have the representation 


CA=D or A-C!D (2.3.4) 


where C is nonsingular and D is diagonal. If A is singular then D is singular and if A is 
nonsingular then D is nonsingular. We can also represent A in the forms 


A-DB, A=C,D,B, 


where C, C; and B, B, are nonsingular matrices and D, D, are diagonal matrices. These 
will be considered later after looking into the solution of a system of linear equations. 


2.3.3 Solving a system of linear equations 


As we have already seen that a system of m linear equations in n real scalar variables 
can be written in the form 


AX =b 


where A is the m x n known coefficient matrix, X is the n x 1 vector of variables or 
unknowns and b is an m x 1 known vector. For example, 


Xy—X_4+X3-2KxX,=5 
2X, *X;-X4*X,-22 > AX=b 


Xy+X_ +X3—- 3X4, =4 
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where 
ES WD MET X, 5 
A-|2 1 -1 11], X-2|:|, b=]2 
11 1 -3 X, 4 


Here m = 3, n = 4. One method of solving this system is by successive elimination, 
which the student may be familiar with. This will be quite tedious when the number 
of variables and the number of equations are large. We will solve this system by ele- 
mentary operations. If there is a solution for AX = b, that is, if there exists a vector X 
such that AX =b is satisfied then 


BAX = Bb 


has the same solution as the original equation AX = b as long as B is a nonsingular 
matrix. Note that, by premultiplying both sides by B^!, when B is nonsingular B^! 
exists, we get back the original equation 


B^?(BAX)-B^(Bb) = AX - b. 


Since elementary matrices or products of them are nonsingular we may operate both 
sides of AX - b by elementary matrices. Premultiplication by elementary matrices will 
be stated by using our notation in (2.3.2). When premultiplying both sides of AX - b by 
elementary matrices the effects will be on A and b. Hence we need to look only at the 
effects rather than writing the whole system of equations each time. A convenient way 
of writing A and b is to write A and then b separated by a vertical line or two vertical 
lines to indicate that A and b are on separate sides of the equation AX - b. For the 
illustrative example, this representation is then, 


Now the idea is to get rid off the elements in the first column, except the (1,1)-th ele- 
ment, the elements in the second column, except the element at the (2, 2)-th position, 
and so on or to reduce the elements below the leading diagonal of A to zeros or to 
reduce the elements below as well as above the leading diagonal to zeros. Instead of 
doing one operation at a time we can do several operations simultaneously: 


-20*Q» -104«G)2 


This means that (—2) times the first row (on both sides of the vertical line) is added 
to the second row and then (-1) times the first row is added to the third row. The first 
row, the one we are operating with, remains the same and the other rows change. The 
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net result of these operations will give the following configurations: [Always write the 
row that you are operating with first, since it is not going to change, and then write 
the result of the operations on the other rows, on both sides.] 


For a triangular type reduction we will try to get rid off the (3, 2)-th element by using 
the (2, 2-th element. This can be done by first dividing the second row by 3 and then 
adding (—2) times the second row to the third row: 


1 hoo» 55 
1 

O | 2 -22)+3)50 1 4 
3 002 -B| B 


In this triangular type reduction we cannot go further. One way of solving the system 
is to write the whole system at this stage and then solve starting from the last equa- 
tion. Translating (c) in terms of the original variables we have the following system of 
equations: 


X1 -X5?*X4-2X,-5 


5 
Xj—X44 =X, =- 
2778 Ne cg 
13 13 
2X4- —X,- —. 
3 34 3 
There are infinitely many solutions in this case because any one variable, for example, 
X, can be free. We can assign any value to x, and can solve for the remaining variables. 


For example let x, = 0. Then we have 2x; = 2 or X3 = B, Then from the next line above 


P E ee ae 
2 3 34 3 6 3 2 
and finally 
1 
X 73973 82,9525 2 0 CN 


Therefore one solution is 


TERA 113 ) 
pX% 7 \ 3952 GU J 
By taking other values for x, we get other solutions. Note that when solving a system 
suchas the one above it is wiser to verify the final answer by substituting in the original 
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system because it is likely that we may have made some computational errors during 
the process. 

If we wish to reduce our system to a diagonal type format then at the stage (c) 
above, add the second row to the first row to obtain 


Or 0 sepe | 78 
1 -1 5/3 | -8/3 
0 2 -13/3 | 13⁄3 


(d) 


O O 


Now divide the third row by 2 and then add the third row to the second row: 


1 100 -1/3 7/3 
(e) ;05 (34002010 -1/2 | -1/2 
0 0 1 -13/6 | 13/6 


Now with x, = O we can read off values of the other variables from the right side itself. 
That is, x3 = B, X= -i, X= Z. Thus one solution is 


05393) = (2 1 13 ) 
1942.43.44) = 3 26^ 

When doing elementary operations to solve a system of equations the following points 
are worth observing: 


(xii) Interchange the rows (changing the order of the equations does not affect the 
solutions), if necessary, to bring a nonzero number at the (1, 1)-th position to start 
with. Repeat the same technique when dealing with the (i, i)-th position on the way, 
=I, oss 


(xiii) If a division creates fractions at any stage then multiply the equations with 
appropriate numbers to avoid fractions when adding a constant multiple of a row 
to another row. 

(xiv) At any stage of the operations if any equation results in an impossible state- 
ment, such as on the one side of the vertical line there is a zero only whereas on the 
other side there is a nonzero number, then stop the process. There is no solution for 
the system. 


Example 2.3.5. Solve the following system of linear equations if there exists a solu- 
tion: 


X1—-X;*2X4-2 
2x, + 2x; -X4 23 


3X, * X) X475. 
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Solution 2.3.5. To start with we do not know whether the system has a solution or 
not. Hence start the process and continue. Writing as before 


-1 2 2 

2 2 -1|3 

3 1 1 5 
1 -1 2 2 


-2(1)+ (2); -30)+3) > 0 4 -5] -1 
o 4 -5| -1 

1 -1 2 2 

-122)+(3) > 0 4 -5|-1 

0 0 0 0 


The last equation resulted in a statement O = O which is a valid statement. Thus the 
last row disappears. We can start solving from this triangular type format or try to get 
rid off the element at the (1, 2)-th position. This can be done by two steps, at the same 
time avoiding fractions also. Multiply the first row by 4 and then add the second row 
to the first row. That is, 


4 0 3 7 
1 Q2 1 
4; DDS 5 , s 4 
Writing the resulting equations we have 
4x, + 3x3=7 
4x» = 5X3 = -1. 


There are several solutions. We can assign an arbitrary value to x}. For example let 
x3 = 0 then one solution is 


7 1 
X4,X>,X3) = -p0) 
(xax) = (2^7 
Example 2.3.6. Solve the following system of linear equations if there exists a solu- 
tion: 


Xj +X + 2X4 =5. 


Solution 2.3.6. We will interchange the first and second equations to bring a nonzero 
number at the (1, 1)-th position. The resulting configuration is the following: 


1-1 3 2 
0 2 -1 
1 1 2]|5 


118 — 2 Matrices 


Now we start the elementary operations: 


1-1 3 2 1-1 3 2 
-10-(3)20 2 -1] 1; -12)+3)5>0 2 -1) 1. 
0 2 -1|3 0.0 02 


The last equation has resulted in an inconsistent statement that 2 = 0 and hence the 
system has no solution, the system is inconsistent. 


Definition 2.3.3. A system of linear equations AX - b is said to be consistent if there 
exists at least one vector X (at least one solution) such that the equation AX - b is 
satisfied. If there is no such X the system is said to be inconsistent. 


In Example 2.3.6 the system is inconsistent whereas the system in Example 2.3.5 
is consistent. When the system is consistent we may have just one solution (unique 
solution) or many solutions. 


(xv) If Ain AX = bis a square and nonsingular matrix then there is a unique solution 
and the solution is X = A 1p. 


(xvi) If AX = b, with A a square matrix, and if the system is consistent with A singu- 
lar then there are many solutions. 


(xvii) If Ais mxn, m«nthe system AX - b may not have a solution. Consistency of 
the system does not go with m < n or m - n. 


Exercises 2.3 


2.3.1. Write the following matrices as products of the basic elementary matrices of the 
E and F types, if possible (see Definition 2.3.1): 


1000 00 1 0 1-30 0 
1 1 0 0 0100 0 1 0 0 
A, = , A= , A= 
1 |-20 1 0 2 |1 0 0 0 3 jo 1 0 
0 00 1 0000 0 0 0 1 


2.3.2. Prove that the interchange of the i-th and the j-th rows of I„ is a product of 
elementary matrices of the E and F types. 


2.3.3. Evaluate the regular inverses of the matrices in Exercise 2.3.1, if they exist, by 
first writing them as product of elementary matrices and then inverting them. 


2.3.4. Let A be the matrix obtained by interchanging the i-th and j-th rows of I„. Eval- 
uate the regular inverse of A. 
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2.3.5. Reduce the following matrices into the form LU, wherever possible, where L is 
lower triangular and U is upper triangular: 


l -L-2 O 1 1 1 0 1 
A-|-2 0 1|, B=]-1 1 5|, C=|2 3 1 
3 1 -1 2 04 4 3 3 


2.3.6. Reduce the matrices in Exercise 2.3.5 to the form QU where Q is nonsingular 
and U is upper triangular. 


2.3.7. Reduce the matrices in Exercise 2.3.5 to the form SD where D is diagonal and S 
is nonsingular. 


2.3.8. Under what conditions the following matrices A and B nonsingular? 


2 0 0][d 0 O 1 2 1 
A=|0 c oļ|ļo da o|, B=A/O 2 3 
3 2 ıj[0 0 & 0 0x 


2.3.9. Solve the systems of equations by reducing the coefficient matrices to triangular 
type forms: 


(à -Xy - 3X3 +X, =1 
2x -X2 + X3 +X, =2 

3X, — 2X) + X3 - X4 =Í; 

(b) X2 — 2X3 * 2X, 71 
X1-2X;*X4*X4,-2 

X; - X9 - X4 + 3x, = 4; 

(c) x *x,-X4*x4,-1 
2X1 — X2 + X3 + 2x, =2 

X1 +X +X3 +X4=2 


3X1 — 2X3 + X4 - 2X4 = 4. 


2.3.10. Solve the same systems of equations in Exercise 2.3.9 by reducing the coeffi- 
cient matrices to the diagonal type forms. 


2.3.11. Solve the system of equations 


Xi tX;tX4*X,-0 
X1-X?tX4-X4,-0 
-X1 -X2 +X3 +X4=0 


Xj - X) - X4 +X4 — O. 


120 —— 2 Matrices 


2.3.12. Solve the system of equations 


XitX;tX34*X4,-1 
X1- X +X3 -X4 =2 
—X, -X2 +X; +X4=1 


2X1 F 2X3 = 3. 
2.3.13. Solve the system of equations 


Xi + X2 + X3 tX,-2 
X1-X;*tX34-X,-1 
Xi + 2X) t X4 t X4, 23 


2X, +X + 2X3 - 2. 


2.3.14. Writing the equations in Exercises 2.3.11, 2.3.12 and 2.3.13 as AX - b and then 
reducing A to triangular type forms determine whether or not (a) A is nonsingular in 
each case, (b) A can be represented as LU in each case where L is lower triangular and 
U is upper triangular, (c) A can be written as BD in each case where B is nonsingular 
and D is diagonal. 


2.3.15. Writing the equations in Exercises 2.3.11, 2.3.12 and 2.3.13 in the form AX = b 
write A = BDC where D is diagonal, B and C nonsingular, B + I,, C 2 I. 


2.3.16. Solve the system of equations 


Xj *2X;tX4-X4,-2 
X2 + 5X4 +X, =4 
X-X, +X; =2 


Xi + 2X) + X4 — X4 — 5. 


2.3.17. Ifthereisa solution for the system in Exercise 2.3.16 what is the geometric inter- 
pretation of a solution? If there is no solution in Exercise 2.3.16 explain the geometry. 


2.3.18. If there is a system of n linear equations in n unknowns x, ...,x,, that is, 
AX = b where A is n x n and X' = (x,,...,x,), and if A is orthonormal, is the system 
consistent? If so how many solutions are there? Obtain a solution without using ele- 
mentary operations. 


2.3.19. Suppose A, B and A + B are all nonsingular n x n matrices. Show that A! + B^! 
is nonsingular and that 


(A7 +B)" - A(A B1B = B(A + B) !A. 
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2.3.20. Verify the results in Exercise 2.3.19 for 


1 0 
A=|1 2 
2 1 


Om. N 
A` -e e 
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In this section we shall consider a method of evaluating the regular inverse of a given 
nonsingular matrix by elementary operations, checking for linear dependence of a set 
of vectors by elementary operations, the concepts of row and column ranks of a matrix 
and the rank of a matrix. 

First we deal with a technique of evaluating the regular inverse of a given matrix 
whenever the inverse exists. There are several ways of doing this. One method based 
on elementary operations will be discussed here. 


2.4.1 Inverse of a matrix by elementary operations 


Let A = (aj) be a given n x n matrix. If the regular inverse of A exists let us denote it by 
A 1. Then from the definition itself 


AA =I (2.4.1) 


where I is the identity matrix. Equation (2.4.1) is the same equation if both sides are 
multiplied by the same nonsingular matrix, say B, in the sense 


AA !-I = BAA'=B = AA! - I. 


Also note that B(AA !) = (BA)A ! or we can premultiply A with a nonsingular matrix 
Band it will be equivalent to premultiplying (AA!) with B. Since elementary matrices 
are nonsingular we will premultiply on both sides by elementary matrices and try to 
reduce A to an identity matrix. If this is possible then A^! is the product of the elemen- 
tary matrices on the right. 


Example 2.4.1. Evaluate the regular inverse of the following matrix A if it exists. 
1 0 

A=|2 1 3 
3 1 


Solution 2.4.1. If A was a rectangular matrix we would not have attempted to evaluate 
the regular inverse since regular inverses do not exist for rectangular matrices. Our 
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matrix A is a square matrix. It may or may not have the regular inverse A^!. We start 
with the assumption that A^! exists and proceed with elementary operations on the 
left on both sides. If A^! does not exist then some inconsistency will arise during the 
process. Then we stop. If no inconsistency arises during the process then A ! will be 
available on the right side when A reduces to an identity matrix. Hence consider the 
equation 


10 -1 100 
2 1 3|4!-|0 1 0 
3 1 4 0 0 1 


We premultiply both sides by elementary matrices. In the first stage our aim is to 
reduce the first column elements of A to zeros except the first element. This can be 
achieved by a sequence of operations with elementary matrices which in our notation 
can be stated as follows: 


-2(1) + (2); -3() + 3) > 


[(—2) times the first row added to the second row and (-3) times the first row added to 
the third row give] 


1 0 -1 1 00 
0 1 5|43-|-2 1 0 
0 1 7 -3 0 1 


Note that we do the same operations on both sides (same premultiplications by ele- 
mentary matrices on both sides). The net effect of these multiplications on the left is 
on the matrix A itself. The final result of this stage of operations is given above. Our 
next aim is to get rid off the elements in the second column of the resulting A by op- 
erating with the second row (or with the help of the element at the (2,2)-th position). 
We have the (1, 2)-th element already zero and hence we need to get rid off only the 
(3,2)-th element. 


1 0 -1 1 0 0 
-12)+(3) > |O 1 5]At=]-2 1 0 
00 2 -1 -1 1 


The next stage is to get rid off the elements in the third column and at the same time 
make the elements at (3,3)-th position 1. This can be done first by dividing the third 
row by 2 and with the help of this new third row get rid off the elements at the (2, 3)-th 
and (1,3)-th positions. The operations are the following: 


50) -5(3)+(2); Go 


10 0 ; 0 3 
-1 _ 1 7 5 

(0) 1 (0) A = 2 2 73 
0.0 1 a. xl d 
2 2 2 
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Hence, writing with I outside, 


1 -1 1 
At==|1 7 5 
-1 -1 1 


The student may verify by multiplying this with A to see whether the product is an 
identity matrix, to make sure that no computational error is made during the process. 


Instead of writing the reduced matrix, A! and the reduced right side each time, 
we may simply write the configurations of the elements in A first then put a vertical line 
and write an identity matrix. Continue with the operations with the aim of reducing A 
toan identity matrix, each time doing the same operations on both sides ofthe vertical 
line, [always premultiplications by elementary matrices on both sides of the vertical 
line]. If A reduces to an identity matrix then what is obtained on the right of the vertical 
line at this stage is A“. 


Example 2.4.2. Evaluate the regular inverse of A if it exists, where 


1 0 -1 
A-|-3 2 1 
-2 2 0 


Solution 2.4.2. We start with the equation AA“! = I assuming that A^! exists. Let us 
write the configuration in A, a vertical line, the configuration in an identity matrix, in 
that order. That is, 


1 0O -1] 1 0 0 
3 2 1 O 1 0 
-2 2 0;0 O 1 


Now we start with the operations on both sides, using the same notations as before: 


1 O -1|1 0 0 
30-0» 200430202 -2 1 0 
02 2|2 0 1 

10 -1 1 0 0 

-122)+(3) > 0 2 -2|3 1 0 

Oo O Of -1 -1 1 


The last row on the left is null. Whatever linear operations we do on the left this form 
cannot be reduced to an identity matrix. Hence A! does not exist in this case. 
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(i) While doing elementary operations on the left if any row of the reduced matrix 
on the left becomes null at any stage then there is no regular inverse for the given 
matrix. Hence stop the process then and there when a null vector is obtained. 


(ii) If the given matrix is rectangular then there is no regular inverse and hence do 
not start elementary operations if the aim is to find the regular inverse. 

(iii) While doing elementary operations on the left multiply the rows by appropriate 
numbers (do the same multiplications on both sides of the vertical line) in order to 
avoid fractions. Then at the very end of the operations divide the rows by appro- 
priate numbers to create an identity matrix on the left of the vertical line. This will 
make the computations much easier. 


2.4.2 Checking linear independence through elementary operations 


Consider ordered sets of real numbers defined as vectors. Consider m such n-vectors. 
These m vectors can also be looked upon as the m rows of an m x n matrix. In such a 
case we will be checking the linear dependence of the rows of a matrix also. Write the 
m vectors as a matrix and apply elementary operations on the left, that is premultiply 
by elementary matrices. As seen from Chapter 1, linear independence or dependence 
in a set of vectors is not altered by nonzero scalar multiplications or additions, the 
two basic elementary operations corresponding to the two types of basic elementary 
matrices. Also interchanges of rows will not alter the linear independence or depen- 
dence in the set. An interchange of rows can be looked upon as a product of elemen- 
tary operations. Let m x n for convenience. By elementary operations on the left, with 
interchanges if necessary, bring A - (aj) the matrix representing the m vectors to the 


I C 
A r 
[o ol 


where I, is an identity matrix, r < m, O indicates a null matrix and C is a matrix which 


following form: 


may or may not be null. Such a reduction is always possible provided there are no 
null column vectors in the first r columns. If r = m then the null matrices will not be 
present. Since I, is an r x r identity matrix, r < m < n the first r rows of the reduced 
matrix are linearly independent. Thus the maximum number of linearly independent 
rows is r or the maximum number of linearly independent vectors in the given set of 
m vectors is r. This process will be clear from the following example. 


Example 2.4.3. Determine the maximum number of linearly independent row vectors 
and the maximum number of linearly independent column vectors in the following 
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matrix A where, 


2 -3 1 5 
A-|1 2 -1 1 
3 -1 0 6 


Solution 2.4.3. Since our aim is to check for linear independence or dependence we 
can interchange the rows, if necessary. Let us interchange rows 1 and 2 to bring a 1 at 
the (1, 1)-th position. That is, 


1 2 -1 1 
2 3 1 5 
3 -1 0 6 


Now we do elementary operations, using our notations introduced earlier: 


1 2 -1 1 

-2(1)+ (2); -3()+(3) => 0 -7 3 3 
0 -7 3 3 

1 2 -1 1 

-12)+(3) => 0 -7 3 3 

0.0 0 0 


Linear independence can be determined at this stage itself without bringing the 
matrix A to the form [5 Ap The first two rows are linearly independent. Hence the 
maximum number of linearly independent rows is 2. If we wish to bring the matrix to 
the above form then do the following operations. Divide the second row by (-7) and 
then add (—2) times the second row to the first row: 


1 13 
-loy -224(02 : i EX B E. 
pU 7 7|7jo o 
0 0 O0 (0) 


Let 


Then 
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and 


13/7 13 3 
= —e, - =e). 
-3[7 7 7 


Hence the last two columns are linearly dependent on the first two columns. Thus the 
maximum number of linearly independent columns is also 2. This in fact is a general 
result. Before stating the general results we may observe one aspect. In the above ex- 
ample we had interchanged two rows to start with. Thus the columns are disturbed. 
Is linear independence of columns affected by such an interchange? Note that at the 
last stage we may interchange the rows back to the original order. Still we have the two 
unit vectors e; and e, and still the last columns or the columns in C are linear func- 
tions of e; and e,. Hence linear dependence in the columns is not affected by such an 
interchange of rows. 


Definition 2.4.1. The maximum number of linearly independent row vectors in a ma- 
trix A is called the row rank of A. The maximum number of linearly independent col- 
umn vectors in a matrix A is called the column rank of A. It can be proved that the row 
rank equals the column rank. Then this common rank is called the rank of the matrix. 


(iv) In any matrix the row rank is equal to the column rank. 


This result can be proved without much difficulty. Premultiplication of a matrix by el- 
ementary matrices does not alter the linear independence of the system of row vectors 
or column vectors. By elementary operations, and row interchanges if necessary, bring 
the given m x n matrix, m x n, to the form 


LL € 
(a) A F a 


where O indicates a null matrix and C may or may not be null. Let e,,...,e, be the r 
basic unit column vectors. Since C is an r x (n — r) matrix every column there is an 
r-vector and hence can be written as a linear combination of the basic unit vectors 
€p... €,. Since every column vector in C is dependent on e, ..., e, the column rank 
isr. Since I, is present as the first block the first r row vectors are linearly independent, 
the remaining are null vectors. Hence the row rank is also r. A similar argument holds 
for the case m 2 n. 


Definition 2.4.2. Iftherankofan mxnornxm matrix with m x nismthen the matrix 
is said to be a full rank matrix. 


Definition 2.4.3. An n x n matrix A with rank n is said to be a nonsingular matrix with 
a regular inverse A^! such that AA! = I,,A 1A = I, and if the rank is r < n then it is 
called a singular matrix with no regular inverse A^!. 
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In Example 2.4.3 the rank of the matrix is 2. It is rectangular. It is not a full rank 
matrix. If the rank was 3 then we would have called it a full rank matrix. Some of the 
immediate consequences of the concept of rank are the following: 


(v) The rank of an m x n matrix cannot exceed m or n. The maximum value possible 
is the smaller of m and n. If the maximum is attained then it is a full rank matrix. 
A nonsingular matrix is also a full rank matrix. 


(vi) An n x n matrix is nonsingular iff its rank is n. 

(vii) The rank of a null matrix is zero. 

(viii) The rank of an n x n orthonormal matrix is n. 

(ix) The rank of an n x n idempotent matrix, not equal to J,,, is less than n. 
(x) The rank of an n x n nilpotent matrix is less than n. 


(xi) The rank of any matrix A and the rank of cA are the same where c is a nonzero 
scalar. 


Exercises 2.4 


2.4.1. Evaluate the ranks of the following matrices: 


21-12 2 1 0 0 
0 1 -1 1 
"PE T ML ae 2 Balen 33-23 
Z LEE NE NE DE M e 
4 2 -2 6 4 0 14 


2.4.2. Evaluate the ranks of AB and CB where A, B, C are given in Exercise 2.4.1. What 
can you say about the rank of a product of two matrices in terms of the ranks of the 
individual matrices? 


2.4.3. Showthat the rank of AB, where A and B are general matrices with AB defined, 
cannot exceed the rank of A or the rank of B. 


2.4.4. Compute the ranks of (a) A+B, (b) 2A -3B for the A and B given in Exercise 2.4.1. 


2.4.5. Show that for two arbitrary matrices A and B such that aA +B is defined, where 
a and f are scalars, the rank of aA + BB cannot exceed the rank of A plus the rank of B. 


2.4.6. Show that the only idempotent matrix with full rank is the identity matrix. 


2.4.7. If A is a square matrix what can you say about the rank of A? in terms of the 
rank of A. Verify your result for the A of Exercise 2.4.1. 


2.4.8. LetAbenxn, X annx1 vector then show that for the system of linear equations 
AX = O to have a non-null solution (at least one X satisfying AX = O is such that X 7 O) 
the rank of A must be less than n. If the rank is n then the only solution is X - O. 
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2.4.9. Let 
10 -1 X 
A=|1 1 2], X=|x 
0 1 -1 X3 


and consider the equation AX = AX. Find the values of A so that the equation AX = AX 
has a non-null solution (X + O). 


2.4.10. Evaluate the ranks of the following n x n matrices: 


a b .. b 1 1, Ae a | 
b a .. b 1 | V 
A-|. . _ |, az0bz06azb B=| . ERE 
b b a 1 1 1 
1 1 1 1 a dj ay! 
E EET k 2 n 
C= : > G= : 22 a2 a2 > 
SY jet ET : 
n a Gy. È att 


aps are distinct and nonzero. 


2.4.11. What can you say about the rank of a singly stochastic matrix (sum of the ele- 
ments in each row or each column is 1). Verify your result for a 2x 2 and 3x 3 matrices. 


2.4.12. Let A and B be nonsingular n x n matrices. Show that 
AB, AB!, AB, A'B! 


are nonsingular matrices. 


2.4.13. Let A and B be n x n nonsingular matrices. Show that A+ B and A - B need not 
be nonsingular. Give two such examples of A and B. 


2.4.14. If Aismxn, m«nand with rank m then show that AA' is nonsingular. 
2.4.15. Show that the rank of AB is zero iff AB - O. 


2.4.16. If Aismxn, m < nand with rank m and if Bis mx m nonsingular and C is nxn 
nonsingular is BA of rank m? Is AC of rank m or n or something else? 


2.5 Row and column subspaces and null spaces 


Here we examine the subspaces generated by the rows of a given matrix, the columns 
of a given matrix and subspaces which are orthogonal to these. Then we examine the 
bases and dimensions of these subspaces. 
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2.5.1 The row and column subspaces 


Consider the subspaces generated by the row vectors of a given matrix A by the fol- 
lowing basic operations of scalar multiplication and addition. Let this subspace be 
denoted by S,. Then S, is called the row subspace of the matrix A. Thus every linear 
combination of the row vectors of A is in S,. That is, all the rows are in S,, every scalar 
multiple of every row vector is in S,, every sum of such scalar multiples is also in S,. 
Now, consider the subspace generated by the column of A. Then this subspace, de- 
noted by S,, is called the column subspace of A. Thus every linear combination of the 
columns of A is in S. 

If Ais an mxn matrix then every vector in S, is an n-vector whereas every vector in 
S, is an m-vector. Since the maximum number of linearly independent row vectors is 
r, therank of A, which is also equal to the maximum number of linearly independent 
column vectors of A, the dimension of S, is r which is also equal to the dimension of S3. 
Consider the equation 


AX =O (2.5.1) 


where A is the given m x n matrix and X is ann x 1 vector of unknowns (variables or 
parameters) and O is the null vector. [AX - O is also called the homogeneous system 
of linear equations.] In (2.5.1) each row vector is orthogonal to each solution vector X. 
Consider the set of all such solutions, {X}, that is the set of all possible X satisfying 
(2.5.1). 


Definition 2.5.1(The null space). The set of all possible solutions, (X], of the equa- 
tions in (2.5.1) is called the null space or the right null space of the matrix A. 


Definition 2.5.2 (The left null space). The solution space {Y} of the equation A’ Y = O, 
where A' is the transpose of A, is called the left null space of A. 


Let us denote the null space or the right null space by S, and the left null space by 
S,. Then each vector in S; is an n-vector and it is orthogonal to each vector in the row 
subspace S}. Similarly, each vector in S, is an m-vector and it is orthogonal to each 
vector in the column subspace S,. The subspaces S, and S, are orthogonal to each 
other. Also the subspaces S, and S, are orthogonal to each other. Not only that, S3 
is the orthogonal complement of S, and S, is the orthogonal complement of S}. (See 
Chapter 1 for the definitions.) 


Definition 2.5.3 (Orthogonal complements). If two subspaces of n-vectors S and S* 
are such that the dimension of S is r, the dimension of S* is n - r and further, S and 
S* are orthogonal to each other then S* is called the orthogonal complement of S, and 
vice versa. 
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We will use the notation 1 to denote orthogonality. Then we have 


S,153 or S,1S5, (orthogonal to each other). 
SLS, or S,L1S, (orthogonal to each other). 


The maximum number of linearly independent n-vectors is n. Also from Chapter 1 we 
know that orthogonal vectors are linearly independent. The dimension of S, is r, the 
row rank of A. Since S5 is the orthogonal complement of S; the dimension of S, is n- r. 
Similarly the dimension of S, is m - r, the number of rows minus the column rank of A. 
Thus we have the following results: 


(i) The row subspace of an m x n matrix A and the null space (the right null space) 
S5 are of dimensions r and n -r respectively, where r is the rank of A, and further, S, 
and S; are orthogonal complements of each other. Similarly the column subspace 
S; and the left null space S, are of dimensions r and m -r respectively and further, 
S; and S, are orthogonal complements of each other. 
(ii) 

dimension of S,+ dimension of S} =n 


dimension of S,+ dimension of S, = m. 


Example 2.5.1. Obtain 2 bases each for the row subspace S,, the column subspace S, 
and the null space of A, where 


1 1 01 
A=|2 1 1 2 
4 1 14 


Solutions 2.5.1. In order to determine the rank or establish a basis for the row or col- 
umn subspace we proceed as before, namely do elementary operations since the linear 
independence or dependence in the set of row or column vectors is unaltered by these 
operations. Also we use the same notations as before: 


1 1 0 1 
-200«-(2; -4(043)20 -3 1 0 
0o -3 1 0 
1 1 0 1 
-1(2)+() >= 0 -3 1 0 (a) 
0 000 


At this stage we know that the row rank =r = 2 = the column rank. Hence the dimen- 
sion of the row subspace S5, is 2, that of the null space S} is 4 - 2 = 2, that of the column 
subspace S, is 2 and that of the left null space S, is 3 - 2 = 1. Any two linearly inde- 
pendent row vectors is a basis for the row subspace S,. Two such bases are then two 
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such collections of two linearly independent row vectors of A. Many such sets can be 
constructed. One such set is already in (a). Two such bases are the following: 


(1,1,0,1) (1,1,0,1) 
1 ; 2 j 
i I z aa 
Any two linearly independent column vectors of A is a basis for the column subspace 
S,. Two such bases are the following: 


1 i 1 
(1) 2),4 rhs @ 2 |, 
4 1 4 


Every vector in the null space S; is orthogonal to every vector in S,. There are sets of 
two such linearly independent vectors. A general method of constructing a basis for 
S5 would be to take a basis in S, and construct two orthogonal vectors. In our example 
here a simpler way of doing it is to reduce the reduced form of A, given in (a) above, 
further. Divide the second row by (-3) and add (-1) times the second row to the first 
row to obtain 


1 0 1 
0 1 -3 0 (B) 
0 0 0 


1 


Then a vector in the null space S must be orthogonal to the vectors (1,0, 5,1) as well 


as to (0,1, -4, 0). Let V, V' = (a, b, c, d) be a vector in Sj. Then 


(1.0, j1)v-o > a+ Eds 
3 3 
and 
(0.1,-3,0)v= 02 b- io 
3 3 
Two solutions are 
(a, b, C, d) = (-2,1, 3, 1), (-3,1, 3,2). 
Two other solutions are 
(a, b, C, d) = (-4, 1, 3, 3), (-5, 1, 3, 4). 


Hence two bases for the null space S; are the following: 


By f-3 -4] [-5 
1 1 1 1 

1 , TENE S ; 

(1) 3 (2) 3 3 


1 2 3 4 
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If two bases for the left null space S, are needed then we can start with A. Since we 
have done row operations to reduce A to (f) hence (f) is not a suitable starting point 
here. Take any arbitrary column vector U, U' = (a,b,c). Then U must be orthogonal to 
the columns of A. Take any two linearly independent columns, say the second and the 
third columns in A. Then take the dot products with U. That is, 


a-b+c=0, b+c=0. 


Two solutions are the following: 


a 2 -2 
b |=| 1 ], | -1 
C -1 1 


Note that the dimension of S, is 1 and hence all vectors in S, will be scalar multiples 
of U. Then, for example, two bases for S, are the two vectors given above. [A basis 
consists of only one vector in this case.] 


Example 2.5.2. Show that every row vector in A of Example 2.5.1 can be written as 
a linear function of the vectors in each basis for the row subspace S, constructed in 
Example 2.5.1 and that each column vector in A can be written as a linear function of 
the vectors in each basis of S, there. 


Solutions 2.5.2. Let us start with the basis (1) of the row subspace S,. Then the vector 
(4,1,1,4) = 2(1,10,1) + (2, —1, 1,2). 


Hence all the row vectors are expressible as linear functions of the vectors in each 
basis of S,. Now let us consider the basis (1) of the column subspace S;. 


1 1 0 

aj 2]|«b| -1 1 ^a ee or 
4 1 1 2 2 
1 1 0 

ES Veh hella 

Sla ? 1 
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2.5.2 Consistency of a system of linear equations 


A system of linear equations can be written as AX = b, that is, 


qj e Ay | | XY bi 
: fa]: (2.5.2) 


mn Xn bm 


where A and b are known and X is the unknown part. Writing the equations in (2.5.2) 
in a slightly different equivalent form we have, 


ay a12 Ain bi 
xj i: [9X] io teectX|id-|:[. (2.5.3) 
a a 


mi m2 Amn bm 


Note that (2.5.2) and (2.5.3) are equivalent representations of AX = b. If there is a solu- 
tion for AX = b then we have a set of numbers for 


X" OG 5x4) S (Cpe Ce) 


and for this set (2.5.3) becomes 


am Am2 Amn bm 


In other words, the vector b is a linear combination of the column vectors of the matrix 
A or b is an element of the column subspace S, of A. Therefore for the system AX = b 
to have a solution, b must be an element of S}. 


(iii) The system of linear equations AX = b is consistent (have at least one solution) 
if and only if b is an element of the column subspace S, of A. 


Now, let us examine the general solution for the system AX - b. Consider the homo- 
geneous system 


AX-0 (2.5.4) 


and consider all solutions of this homogeneous system. There are n - r linearly inde- 
pendent solutions for (2.5.4) when A is m x n and when r is the rank of A. Let one set 
of such linearly independent solutions be denoted by X, ... , X, ,). That is, 


AX(y = O, fori-12...,n-r. 
Take a general linear combination of X(5, i 2 1,...,n - r, say 


Y -dXyy € d, Xa (2.5.5) 
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where d,,...,d,_, are arbitrary constants. Then 
AY =0. (2.5.6) 
Let X, be a particular solution of AX = b. That is, 
AX, - b. (2.5.7) 
Combining (2.5.6) and (2.5.7) we have 
A(Y +X )=O+b=b (2.5.8) 
or Y + Xy is a solution of the starting system AX = b. Thus a general solution of AX = b 


is available as follows: 


(iv) The general solution of the linear system AX = b is Y + X, where X, is a partic- 
ular solution of AX = b and Y is the general solution of AX = O. 


Example 2.5.3. Consider the matrix A in Example 2.5.1 and consider the system of 
linear equations 


E. xus b, 

2 -1 1 2||?|-|b| or AX=b. 
X3 

4 1 14 b, 
X4 


(i) Find one vector b, so that the system is consistent. 
(ii) Find the general solution of the system AX = bo. 


Solution 2.5.3. (i) For the system to be consistent b must be an element of the column 
subspace S, of A. That is, b must be a linear function of the vectors in a basis of S}. 
Since the rank of A is 2 any two linearly independent columns of A is a basis of S,. For 
example take 


1 1 
2]. -1 
4 1 
and b as the sum, 
1 1 2 
b={ 2 |+| -1 J=] 1 
4 1 5 


Many such b’s can be constructed. (ii) A particular solution of this system 


1 1 0 1 i 2 

2.-112 ?|2|1 (2.5.9) 
X3 

4 1 1 4 5 
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can be taken either from the representation (a) or from (8) of Example 2.5.1. Taking 
from (f) a particular solution Xj) = (0,2,3, 0). The general solution of the system AX = O 
is a general linear combination of the vectors in a basis for the null space S3. Taking 
the basis (1) of S} in Example 2.5.1 a general linear combination is 


-2 -3 
1 
C +d 1 
3 3 
1 2 


where c and d are arbitrary constants. Therefore the general solution of the system 
(2.5.9), denoted by Z, is 


(0) -2 -3 —2c — 3d 
2 1 1 

Z- ad kd Z c+d+2 
3 3 3 3c+3d+3 
(0) 1 2 c+2d 


where c and d are arbitrary constants. [The student may substitute back in (2.5.9) and 
verify the result.] 


Definition 2.5.4. A linear system of equations AX = b, where A is n x n, is said to be 
a singular system if A is singular, and a nonsingular system if A is nonsingular. 


Exercises 2.5 


2.5.1. Show that interchange of two rows in an m x n matrix can be effected by pre- 
multiplying it by a product of elementary matrices. 


2.5.2. Compute the ranks of the following matrices: 


3 1 0 -1 2 sate oh oe 
401-1 1 «es 
A X a 
2410 1 d: oe. lub di 
10 1 1 1 
2- Oo qu 
12 1 1 
C-|3 2 2 0 
5 23 -1 


2.5.3. Construct 3 bases (not scalar multiples of each other) for (1) the row subspace 
S, and (2) the column subspace S; for each of the matrices in Exercise 2.5.2. 


2.5.4. Construct 3 bases for the null space S} and the left null space S, for each of the 
matrices in Exercise 2.5.2. 
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2.5.5. Check for linear independence and determine the maximum number of linearly 
independent vectors in each of the following sets: 


(a) U;-(LL-12, U, = (1,1,0,1), 
U3 =(2,-1,1,4), U, = (1,2,1,2), 


U; = (0,1, 21,1). 
1 2 0 
zi -1 1 1 
b) W-|2| V-|1|, Vs la, v=l1 
1 1 0 
1 1 2 1 


2.5.6. For each of the matrices in Exercise 2.5.2 write the systems of linear equations 
as AX = b,, BY = by, CZ = b}. (a) Construct two different vectors for each of b}, b, b, so 
that the systems are consistent. (b) Construct two vectors for each of b,, b», b, so that 
the systems are not consistent. (c) For your answers in (a) evaluate the most general 
solutions for each of the systems. 


2.5.7. Determine the product of the basic elementary matrices of the E and F types so 
that the matrix 


O 1 2 3 O 1 2 3 
1-12 4 2 1 -1 1 
=> 
2 1 -1 1 3. 2 1 4 
3 2 1 4 1 -1 4 


2.5.8. Let U and V ben x 1 vectors and let A = VV', B = UV'. Show that the n x n 
matrices A and B have rank 1. 


2.5.9. Show that the Vandermonde's matrix 


1 a M uro 

ET NEU o ... n 
V= E : 

1 ap ut 


is nonsingular, where a; + 0, i=1,...,n and aj's are distinct. 


2.5.10. For a 3 x 3 non-null matrix A, each row dot product with the vector (1, —1,1) is 
zero. What can you say about the rank of A? What can you say about the dimension of 
the null space? 
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2.5.11. Determine the rank and construct a basis each for (1) row subspace, (2) column 
subspace, (3) right null space, (4) left null space of the matrix 


a b .. b 
A= 2 T 2 , a#tb,a#0, b#0. 
b b .. a 


2.5.12. Ina non-null 3 x 3 matrix A each row is orthogonal to the vectors 
1 
V,=| 0], Vs 
1 


Determine the rank of A and construct a basis for the row subspace of A. 


2.5.13. Consider the linear system of equations AX = b where 


A= 


PNR 
Orr 


1 
2 
1 


(a) Construct two examples of b where the system AX = b is not consistent; (b) Con- 
struct two examples of b where the system is consistent; (c) In (b) construct one basis 
each for the right null space of A; (d) In (b) construct the general solution in each case. 


2.5.14. Let J be the n x 1 column vector of unities, that is, J’ = (1,1, ...,1). Let 
1 F 1 1 
A--Jl and B=I,--—JJ’. 
n n 


Evaluate the ranks of A and B. 


2.5.15. Construct one basis each for the right null space of the matrices A and B in 
Exercise 2.5.14. 


2.5.16. Show that any matrix A of rank r can be written as 


p.p 
AsR|" 
l ols 


where R and S are nonsingular matrices. 


2.5.17. If A and B are rectangular matrices of the same rank then show that there exist 
two nonsingular matrices R and S such that 


B= RAS. 


2.5.18. If A and B are nonsingular matrices and C is any matrix then show that the 
matrices C, AC, CB, ACB have the same rank as long as the multiplications are defined. 
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2.6 Permutations and elementary operations on the right 


So far we were dealing exclusively with operating on the left with elementary matrices, 
that is, premultiplications of a matrix with elementary matrices. By this time the stu- 
dent may be very clear about the effects of such premultiplications on a given matrix. 
Now we consider postmultiplications by elementary matrices. 


2.6.1 Permutations 


It is already seen that if we wish to permute or interchange rows we can effect that with 
a product of the basic elementary matrices. Consider the matrices 


100 100 10 0 
G=|0 0 1], F=/0 1 Of, Ę&=ļo0 1 Of, 
0 1 0 011 0 0 -1 
100 1 0 0 
F,=|0 1 1|, E-|0 -1 Of, 
0 0 1 0 1 
100 10 0 
F;=|0 1 0|, Ej,|0 1 0 
0 1 1 0 0 -1 


Then it is easily seen that 
G = EF,E,F,E,F,, 


a product of the basic elementary matrices. If we premultiply an arbitrary matrix with 
G then we have 


1 0 Of fay àp ag âu 05 43 
0 O 1} J} ay an az |= |a; 4» 43 
O 1 Of} [a3 d 03 Ay G3 3 


That is, the second and the third rows are interchanged as itis the case in G or G can be 
looked upon as an identity matrix with the second and the third rows interchanged. 
Thus, in general, permutations of the rows can be achieved by premultiplication with 
a product of elementary matrices. 


2.6.2 Postmultiplications by elementary matrices 


The technique of postmultiplications is postponed this far mainly to give the student 
time to have the premultiplication ideas to sink in clearly. Otherwise there is great 
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chance of getting confused about the effects of pre and post multiplications. Let us 
start with the basic elementary matrices of the E and F types. Let 


100 
E-|0 4 0 
0 0 1 


a matrix obtained by multiplying the second row of an identity matrix with 4. E, oper- 
ating on the left has the effect that the second row is multiplied by 4. Observe that E; 
can also be looked upon as created by multiplying the second column of an identity 
matrix by 4. Let us see what happens when E; operates on the right: 


aun an dg3||l 0 O a, 4a ay 
AE =|ay à» d53||0 4 O}=|ay 445 az 
43, 4d» 4d5,|O0 O 1 43, 4032 33 
That is, the second column is multiplied by 4. Thus we have the following result: 


(i) If E is a basic elementary matrix created by multiplying the i-th column of an 
identity matrix by the nonzero scalar c then any matrix A postmultiplied by E will 
have the effect that the i-th column of A is multiplied by c. 


Now, let us consider an elementary matrix of the F type. Let 


100 
F,=|1 1 Of, 
o 0 1 


obtained by adding the first row of J; to the second row, which can also be considered 
as obtained by adding the second column of I, to the first column. Let us see what 
happens if A is postmultiplied by F}: 

4, dp d3||1 0 O Ayn tj Ay dj 

AFı=|an à» d5||1 1 O}=|ayt+ay ay az 

43, 4d» 4d35||O O 1 43, + 437 A32 33 
That is, the second column is added to the first column, exactly as F} is obtained by 
adding the second column of J, to the first column. Now see the net effect of operating 
on the right with F? the transpose of F,. This transpose could also be looked upon as 
obtained by adding the first column of I; to the second column: 


an dp à||1 1 O à Aytay ag 

10 = 
AFi =|an à» 4 3/|/90 1 O|-]|à ayt+ay a3 
d3 437 43|[0 O 1 03; 437+ Az, 035 


Now the first column is added to the second column. Thus F} is the matrix to operate 
on the right if the same type of effects on the columns is needed as the effects on the 
rows when F; operates on the left. 
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(ii) Let F be an elementary matrix obtained by adding the i-th row to the j-th row 
in I„. Let A be ann x n arbitrary matrix. Then FA has the effect that the i-th row of A 
is added to the j-th row of A. AF has the effect that the j-th column is added to the 
i-th column. AF’ has the effect that the i-th column is added to the j-th column. [F' 
is also the elementary matrix obtained by adding the i-th column to the j-th column 
of I,,.] 

(iii) If F, is an elementary matrix obtained by adding the i-th column of J, to the j-th 
column of J,, then any arbitrary n x n matrix A postmultiplied by F, has the effect 
that the i-th column of A is added to the j-th column of A. 


When premultiplying a matrix A by an elementary matrix then create the elementary 
matrix by operating on the rows of the identity matrix I. The effect will be exactly the 
same on the rows of A. When postmultiplying A with an elementary matrix then create 
the elementary matrix by operating on the columns of I. The effect will be exactly the 
same on the columns of A. The student is urged to memorize these properties. These 
will be helpful when trying to reduce a matrix to a triangular or diagonal form. 


Example 2.6.1. Write the following symmetric matrix A in the form A = QDQ' where 
Disa diagonal matrix, Q is a nonsingular matrix and Q’ the transpose of Q, where 


p 0 -1 
A=|0 2 4 
-1 4 4 


Solution 2.6.1. Let 


100 
F,=|0 1 0 or (1)+3)>5 
1 0 1 
1 0 -1 1.00 
F,A=|0 2 4|, FAF[=|0 2 4 
0 4 3 0 4 3 


Observe that by operating on the left by F, and on the right by its transpose F} the 
symmetric nature of the resulting matrix is maintained. If we had operated on the 
right first and then on the left we would have got the same result. [The student may 
verify this aspect.] Now our aim is to get rid off the elements at the (3, 2)-th and (2, 3)-th 
positions and at the same time maintaining symmetry. This can be achieved by doing 
the following operations. Let 

1 0 0 

G,;=|0 1 O} or -22)+(3)>5 
0 -2 1 
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10 0 
G(F,AFl))-|O0 2 4], 
0.0 -5 
10 0 
G(F,AFl)G|-|O 2 0 
0.0 -5 
That is, 
10 0 
A-FjG|o 2 0 |(F'G;1) 
0.0 -5 
But, by inspection 
1.00 100 
Feo. 1 oj, G&Gt=ļ|o 1 OI 
-1 0 1 O 2 1 
1 0O 0||1 0 O 1 0 O 
F'G =|o0 1 O//0 1 oļj=|o 1 O}=Q 
-1 O 1;//0 2 1 -1 2 1 
Therefore 
1 0 10 0 
A=QDQ', Q=|0 1 0, D-j|O 2 O 
-1 2 1 0.0 -5 


[The student may verify the result by straight multiplication of Q, D and Q’.] 


Note that when the matrix A is symmetric we need to consider only a triangulariza- 
tion of A by premultiplication with elementary matrices. The transpose of the product 
ofthe elementary matrices on the left is going to bethe matrix on the right. Hence there 
is no need to evaluate the inverses of the transposes of the elementary matrices also. 
In the above example we obtained Q = F; !G;! and Q' is the matrix on the right. Also, 
Q being a product of elementary matrices will be nonsingular whereas the diagonal 
matrix D need not be nonsingular. If the matrix A is singular then there will be at least 
one zero diagonal element in D and if A is nonsingular then all diagonal elements in 
D will be nonzeros. 


Example 2.6.2. Reduce the following matrix A to the form A - PDQ where P and Q 
are nonsingular matrices and D is a diagonal matrix, where 
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Solution 2.6.2. Note that A is not symmetric here. Let 


O0 


4 On 


Then 


Let 


Then 


Hence 


EDS 


3 


Fj1G;!DG 


Az 


The inverses are available by inspection. That is, 


Then 
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Q= G3 (Fr) 


100/||1 0 -1 1.0 -1 
-[0 1 2/10 1 O}=]0 1 2], 
0 0 0.0 1 0.0 1 
A = PDQ. 


[In order to check for possible computational errors verify the result by straight mul- 
tiplication.] 


(iv) If there is a zero at the (1, 1)-th position then add the i-th row to the first row as 
well as the i-th column to the first column if the element at the (i, 1)-th position is 
nonzero. This will bring a nonzero element to the (1,1)-th position as well as keep 
the symmetry. If symmetry is not to be maintained then do only the first operation 
above. Repeat a similar process at every stage when dealing with a zero diagonal 
element. 


Example 2.6.3. Reduce the same A in Example 2.6.2 to the form A = DQ where D is 
diagonal and Q is nonsingular. 


Solution 2.6.3. Since we want a form DQ we operate only on the right of A with ele- 
mentary matrices. Let 


1 0 1 100 100 

E sIO-—o96[ Ę&=|o $ 0]. R=|1 1 oj, 
001 001 001 
10 0 100 1 0 0 

G;=|0 1 -4], F,=|0 1 O|, G;-|-1 1 0 
0o 0 1 0 1 1 0 0 1 


These are created from an identity matrix as follows: 
F4: first column is added to the third column; 
E4: second column is divided by 2; 
F,: second column is added to the first column; 
G;: (—4) times the second column is added to the third column; 
F,:the third column is added to the second column; 
Gs: (-1) times the second column is added to the first column. 


When postmultiplying A with these elementary matrices the effects will be exactly the 
same on the columns of A: 


1 0 s EE 0o 1 1 0 0 
AF,=|0 2 4||O0 1 Ol-|O 2 4|; 
-1 2 de] oo 0 1 2 3 
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1 0 1 0 
AF,E,=|0 1 4|; AFE,F,=|1 1 
-1 1 3 O 1 
10 0 
AFESFG,-|1 1 O[;: 
O 1 -1 
10 0 
AF, EF G3F, = 1 1 0 
0.0 4 
Hence 
10 0 
AFEF,G,F,G,-|0 1 0 
0.0 -1 
Therefore, 
A = DG5!F,IG3! FE, 1 F,1 
where 
1 0 10 -1 1 
DELO 1 ol, JI SPO. 3904s di epo 
0 -1 0 0 1 0 
1.00 100 1 
Fy'=|-1 1 o|, Gjl=|0 1 4|, F,'=|0 
O 1 O 0 1 0 


1 0 -1 1 0 
Ip E em ux OTS at) 2 
0 0 1 0 0 


^no ONO 


(Remember that this is a premultiplication; the effect is on the rows.) 


1 0 4 1 
GFE; FU =G;'|-1 2 1[|s|- 
Oo n 1 0 

1 0 -1 1 
F,'G3'Fy'E,'Fj'=F,'|-1 2 5 |=|-1 
Oo 0O 1 1 


No ON O 


POO FOO 
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GjFGFIEIF!-Gj]|-1 2 5|-|O0 2 4 


=Q. 
Thus we have written 
A=DQ 
where 
1 0 1 0 -1 
D-|O 1 O and Q=|0 2 4 
0 -1 1 -2 -4 


By operating on the right as well as on the left of a square matrix A by elementary 
matrices, as well as operating on the left alone (considered in the previous sections) 
and operating on the right alone we can reduce a given matrix A to the following forms: 


A-PD, P nonsingular, D diagonal; (2.6.1) 
A-DQ, D diagonal, Q nonsingular; (2.6.2) 
A-RDS, D diagonal, R,S nonsingular; (2.6.3) 
A=ZDZ', Znonsingular, D diagonal, when A - A' (2.6.4) 
A-LU, Llowerand U upper triangular, (2.6.5) 
A-I4DU, L, lower and U, upper triangular, D diagonal, (2.6.6) 


where the representations in (2.6.5) and (2.6.6) are not always possible. They depend 
on the nature of A. As an application of (2.6.4) we can consider reduction of quadratic 
forms to their canonical forms. 


2.6.3 Reduction of quadratic forms to their canonical forms 


Let 

u-X'AX, A=A!' 
be a general quadratic form, where X is an n x 1 vector, A is an nx n matrix of known 
elements and, as we have seen before, A can be taken as a symmetric matrix without 


any loss of generality. By using (2.6.4) write A as 


A - PDP' 
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where P is a nonsingular matrix, D is a diagonal matrix and P’ is the transpose of P. 
Then the quadratic form 


u-X'AX - X'PDP'X - Y'DY 


where 


Y-P'X-| : 
Yn 
Let the diagonal elements in D be d4, ...,d„ (some of these may be zeros depending 
upon the singularity of A). Then 


d O .. OJ||y 

U= (yp os) TEE 

0 0 .. di} [Yn 
-dyj*--d,yi (2.6.7) 


a linear combination of the squares of y;'s. This form in (2.6.7) is known as the canoni- 
cal form ofthe quadratic form. This reduction has many applications in different fields. 
Many such applications are given in the book Quadratic Forms in Random Variables: 
Theory and Applications [7]. We may also observe one interesting aspect in (2.6.7). All 
the y;, j= 1, ..., nare linear functions of the original x;’s (the elements in X) since P’ is 
a matrix of constants. 


Example 2.6.4. Reduce the following quadratic form to its canonical form: 
U = X? — 2X1X3 + 2x2 + 8x3x4 + 4X3. 
Solution 2.6.4. Writing a symmetric matrix A the quadratic form can be written as 


1.0 -1]] x, 
u-QxX,x|0 2 A4]|l|x|. 


-1 4 4 X3 
1 0 -1 
A=|O 2 4 
-1 4 4 


This matrix is already reduced to the form A = PDP' in Example 2.6.1, where 


10 0 1.00 
D-|O0 2 O}, P=|0 1 0|2 
0 0 -5 -12 1 

10 -1 
P'z|Oo1 2 
O O 1 


and 


Writing 


we have 


2.6 Permutations and elementary operations on the right 


1 0 -1]] x, Xi - X3 
P'X=/0 1 2]||x|2| x5 426 
0.0 141] x3 X3 


u = X'AX = X'PDP'X = Y'DY 
= Yj + 2y3 -5yj 


Y! = (X1 = X3, X2 + 2X3,X3) 2 


yiy7X17-X45, Vo =X_Q + 2X3, Y3=X3. 
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Observe that D in the canonical reduction above is not unique. By further elementary 
operations we could have taken out various factors from the diagonal elements. Thus 
Din the representation PDP’ is not unique. 


2.6.4 Rotations 


Here we look at stretching, rotations and projections. The basic ideas will be illus- 
trated in a 2-space. Permutation matrices are already considered in the beginning of 
Section 2.6. Consider a 2 x 2 scalar matrix 


A=ch=(¢ Ai 


If we premultiply a 2 x 2 matrix with A then every row vector there is multiplied by c, 
or we say, stretched by c. Then the above A is a stretching operator. Let 


Then B operating on the left of X gives 


«t als 


For example a point (2,1) or the vector à = 2i + j, 1 = (1,0), j = (0,1) goes to b=-i+ 2j. 


The dot product is 


=X 

2 

(XX2) ( a ) = XX, + X2% = 0. 
1 
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This means our matrix B rotates the vector X = x,i + xj through a 90° angle. Then B 
above is a rotation operator. Let 


e( 9.7) 
ee(r JH) 


The two points (5?) and (2) are the mirror images on both sides of the line x, = x. 
Here we say that C is a reflection operator. Let 


D- 1 0 5 DXx- 1 O} |x, |n 
0 0 o O} [x 0 
This gives the projection of the vector X = xii + 3» onto the x-axis, namely xu Then D 


above is called a projection operator, see Figure 2.6.1. These ideas can be generalized 
to the n-space, n = 3,4, .... 


then 


(exi, ex2) 


P Cxpx5) —— 60.22) i @1, x2) 
p ~~ | ie 
(x1, x2) Sale He A 
E (x1, 0) 
stretching rotation reflection projection 


Figure 2.6.1: Stretching, rotation, reflection and projection. 


2.6.5 Linear transformations 


Consider a vector X in n-space, an n-vector, or a point in the Euclidean n-space. X’ = 
0, ...,X4). Let A be an m x n matrix. Then we have the general properties 

(a) AO = O, where O is a null vector; 

(b) A(cX) = cAX, c is a scalar; 

(c) A(X + Y) = AX + AY where Y is another n-vector. 


Property (c) says that the operation is linear. U = AX in general represents a transfor- 
mation of X going to U where every element of U is a linear function of X. That is, if 
U' = (u,...,u,) and A = (aj) an m x n matrix of constants then 


Uj 2agX, +++ jsXR, i=1,...,m. 
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It is a linear transformation. Later we will give a more general definition of a linear 
transformation after introducing a more general definition for a vector. In that case 
X will be an object called vector and “A” will stand for an operator and AX will be 
A operating on X. In that case “scalar multiplication” and “addition” will also be re- 
defined. In terms of general objects called vectors and an operator denoted by A the 
transformation satisfying (a), (b), (c) above will be called a linear transformation. For 
the time being we will confine our discussion to ordered n-tuples of real numbers as 
vectors and A representing an m x n matrix of constants, c a scalar and addition and 
scalar multiplication as defined before. 


Definition 2.6.1 (Linear and orthogonal transformations). Y = AX where A isanmxn 
matrix of constants and X is an n x 1 vector of real variables, will be called a linear 
transformation. When A is n x n and orthonormal, AA’ = 1,A'A =I, then the transfor- 
mation is called an orthogonal transformation. When A is m x n, m < n and AA! = I, it 
is called a semiorthonormal transformation. 


(v) Geometrically, an orthogonal transformation represents a rotation of the axes of 
coordinates. 


Example 2.6.5. Show that the following transformations are orthogonal transforma- 
tions: Y - AX where 


(a) Az p 2] 


sinô cos@ 


(This rotates the axes through an angle 0.) 


(b) A 


O ell 


TERT 
l < 
Sl-9 -&l- 


Solution 2.6.5. (a) This transformation is 


yV [cos0 -sinOX (xi E 
ya) \sin@  cos0 Jo; 
y, =X,cos@-x,sin@, y,=x,sin@+x,cos6; 


; |cos80 -sin0||cos0 sin@| |1 O| ,, 
AR ey Bod es cos6|"|o 1| ^^ 


Hence this linear transformation is an orthogonal transformation. 


1 1 1 
(b) Y2 ve we R|]? 
ys 3 0 -5 X 
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1 1 
yi = aX +X * X3), Vo = —= (X1 - 2X; + X3), 


"E vé 
1 
3 = a" —x3); AA'-LA'A-I 


and hence this linear transformation is an orthogonal transformation. [If, for example, 
the last row in (b) is deleted we have a semiorthogonal transformation. ] 


Suppose we transform an n-vector to an m-vector by the linear transformation Y = 
AX where A is m x n. Suppose then we transform this m-vector Y to a p-vector Z by the 
linear transformation Z = BY where B is p x m. What is the net result of transforming 
X toz? 
Z = BY = BAX. 


Here BA is still a matrix of constants, BA is p x n. Hence X — Z is also a linear trans- 
formation. 


(vi) Product of two linear transformations, in the above sense, is again a linear 
transformation. 


Let us see what happens to the shape, angles etc under a linear transformation. In 
order to illustrate the changes we will examine a simple linear transformation. Let 
0xx4 <2, 0x x, <1. Consider the linear transformation 


y x, 1 1 
= y = =A 3 A= 
nave nen or [o [ual] al o> 


X=Y» X2=V1 —Y2- 


Under this transformation the rectangle OACB, with angles 5 each or the angle be- 
tween OA and OB is 5 in Figure 2.6.2 and the lengths OA = 2and OB = 1, is transformed 
into a parallelogram with angle between OA and OB changed to a and lengths of OA 
changed to 1 and OB changed to 2 V2, see illustrations in Figure 2.6.3. Thus, in this 
case the shape is not preserved, the lengths are not preserved and the angles are not 
preserved. 

Now, consider the transformation y, = mu * BX and y, = aM - 


1 1 
y? y Tp] 
1 
V2 
EN 
v2 


d. 


5X2 0t 


Here A is an orthonormal matrix. Let us see what happens to the rectangle OACB in 
the 64, x;)-plane. Under this transformation 


1 1 1 1 
x= yo) + y”? X= vo" = y” 
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(a) 


Figure 2.6.3: Additional linear transformations. 


and 


X,20 > yj;--y0 X-2 > y +y, =2V2, 
X%x=0 > yi-y» XQ=1 > y; -y2 = V2. 


Here the angle between OA and OB is preserved as 5. The lengths of the sides 
are preserved. The shape is also preserved. The net effect is the rotation of the axes of 
coordinates through an angle 0 = 7. here. The general orthogonal transformation is of 


4 
the form 


Yul ES sinü | |x aw cos? sing | GAAP SES ALT: 
yo sin@ -cos@] |x, sin@ -cos0 


Orthogonal transformations are simply rotations of the axes of coordinates through an 
angle 0 where the angles, lengths and shapes in the original region in the (x,, x;)-plane 
are preserved in the (y,,y>)-plane. 


152 —— 2 Matrices 


2.6.6 Orthogonal bases for a vector subspace 


Suppose we have located a basis for a given vector subspace. How can we convert this 
basis to an orthogonal system of vectors or to come up with an orthogonal basis for 
a given vector subspace? Recall the Gram-Schmidt orthogonalization process from 
Chapter 1. This is one method of selecting a linear function of the given vectors so that 
the new set will be mutually orthonormal. Then, after locating a basis transform them 
to an orthonormal system by Gram-Schmidt orthogonalization process to obtain an 
orthogonal basis. 


Example 2.6.6. Construct an orthonormal basis for the row subspace of the matrix 


1 1 1 1 

1-1 1 1 
A = 

1 -1 -1 1 

3 -1 1 3 


Solution 2.6.6. Through elementary operations on the left try to determine the rank 
and a basis. Writing the operations by using our standard notations we have the fol- 


lowing: 


-1(1)+ (2; -10«G5» -3(1) + (4) => 


1 1 1 1 
0 -2 0 O 
A = B; 
0 -2 -2 0 
0 -4 -2 0 
-1(2)+ (4; -13)+(4)> 
1 1 1 1 
0 -2 0 O0 
Bo =C; 
Oo -2 2 0 
o 0 0 O 
-1(2)+ (3) > 
1 1 1 1 
Oo -2 0 O 
Ci. Jon. 252350) T 
o 0 0 O 


Hence the rank is 3 and a basis, from C, above, is 


U, =(0,0,-1,0), U,=(0,-1,0,0), U; =(1,1,1,1). 
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Now apply Gram-Schmidt process. Let 


V= Br e (0, 0, -1, 0) 
IU; | 
is the normalized U}. Consider 

W;-U;- (U,V) Vj 
0 
0 

= (0, -1,0,0) - (0, -1,0, 0) 1 V, = (0,-1, 0, 0); 

0 

V, = Wa - (0,-1,0,0); 

|W, 
W; = U; - (U3V{) Vy - (U3V3)V> 
0 


-(LL11)-(L11,1) a (0,0, -1, 0) 


0 
0 


- (1,1,1,1) 0 (0, -1,0, 0) 


0 
= (1, 1, 1, 1) + (0, 0, -1, 0) + (0, -1, O, 0) = (1, 0, 0, 1); 
W. 1 
V, = —3.- — (1,0,0,1). 
> IW v2 


Evidently 
IVi =1, i=1,2,3 and V,V}=0, V,Vj=0, VV; =0. 


Verification. Can we write all the row vectors in A as linear functions of V}, V, and 
V3? Note that 


(1,1,1,1) ==V, - V; + V2V3; 
(1,=1,1, 1) =-V, + V, + V2V3; 
(1,-1,-1,1) = V4 + V, + V2V3 


and the fourth row is already a linear function of the first three rows. Hence {V}, V5, V3} 
is an orthonormal basis for the row subspace of A. 
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2.6.7 Avector subspace, a more general definition 


Now, we are in a better position to give a more general definition to a vector subspace. 
The elements of the subspace are some general objects satisfying some conditions. 
In the definition that we are going to give here the operations "scalar multiplication" 
and “addition” are as defined for vectors (as n-tuples) and matrices before. A still more 
abstract definition can be given by defining “scalar multiplication" and “addition” as 
well. Let S be a set of some objects on which one can define scalar multiplication and 
addition. Suppose S and its elements satisfy the following conditions: 

(a) If V €S then cV eS where c is any scalar, including zero. 

(b) IfU €S, V eSthen U +V €S. 


That is, S is closed under scalar multiplication and addition. Then S will be called a 
vector subspace. 

Note that the same definitions introduced for linear dependence, independence, 
rank or dimension etc go through for this general definition also. 


Example 2.6.7. Check whether the following sets satisfy the conditions for a vector 
subspace and if so construct a basis for the subspace: 

(a) The set consisting only of the null vector. 

(b) The set of all polynomials in t of degree < 5. 

(c) The set of all 2 x 2 matrices with real numbers as elements. 

(d) The set of all 2x1, 2x 2 and 2x3 matrices. 


Solution 2.6.7. (a) It is a trivial case of a vector subspace. Since a null vector is not 
counted in the set of linearly independent vectors we take the dimension of this sub- 
space as zero. 

(b) Let this set be denoted by S. Then 2t + t? and t? are two such polynomials in S. 
Thus, for example, 5(2t + t?) = 10t + 5t? is in S, 120? € S. If 


ao+tat+ +a; and bo+bit+- bu 


are two polynomials in S then their sum and scalar multiples are also in S. These two 
operations cannot create a polynomial not in S. We cannot create a 6-th degree or 
higher degree polynomial by addition and scalar multiplication. Hence S is a vector 
subspace. Note that any general polynomial of degree up to 5 can be generated by the 
following quantities 


1,t,t7,07,t4,0°. 


Obviously these are in S and linearly independent. None can be written as a linear 
combination of the others. Hence a basis is {1,t, 7, t?, t^, t^) and the dimension of this 
vector subspace is 6. [Note that, for example, t? = txt but this is not a scalar multiple or 
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aconstant multiple of t. Hence t? cannot be written in terms of other elements through 
the two operations of scalar multiplication and addition. Thus, these elements 1, t, ..., t? 
are linearly independent.] 

(c) Let this set be denoted by S. All elements are 2 x 2 matrices. Three typical ele- 
ments are 


|a b [a b la b, 
taii MES VEI les and pai P. € S. 


Then obviously V, + V, is a 2x 2 matrix and hence in S. Also cV, € S for any scalar c. 
For c = 0 it is a null matrix and hence the null matrix is also in S. 


0-5 of es -v= ME 
-c -d 

V-V-0«S, LV-V-VLcS, LS, 

V+O=V, cc;V-c(c;V)2 cy(cQV) 


where c, and c; are scalars. Obviously the following general conditions are satisfied 
by the elements of S in this case: 


(1) V,+V,=V,+V,, Vies, Ves 

(2) V; + (V,+ V3) (Vj + V2)+V3, VES 

(3) V+O=V forall V, VeS, O€S 

(4) -VeS, V-V=V+(-1)V=0 

(5) IV-V, Ies 

(6) (c10)V 2 cq(c;V), | c, Cp scalars 

(7) c(V, + Vj) 2 cVi «& cV, cascalar 

(8) (Cy * c3)V 2 QV * cV. 
These eight properties are in fact the conditions that we will impose when we have a 
more abstract definition of a vector subspace where we will also define what is meant 
by “+” and “cV”. But we will not make the definition more abstract in this book. 


What is a basis for our vector subspace in this case? Note that a general matrix of 
the form V = [4 y ] can be generated as a linear function of the following four matrices: 


1 0 O 1 0 0 0 0 
U; = >, U= , U3= » U,= : 
0 0 0 0 1 0 O 1 
Hence a basis is {U}, U,,U3,U,,} and the dimension is 4 since these four are linearly 


independent. 
(d) Let the set be denoted by S. Let 
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Then U € S, V € S whereas U + V is not defined and hence S is not a vector subspace 
in this case. 


2.6.8 Alinear transformation, a more general definition 


Taking the general definition of a vector subspace in Section 2.6.7 we can define a lin- 
ear transformation. Let the elements of a vector subspace S of Section 2.6.7 be some 
general objects where S is closed under addition and scalar multiplication. Let A repre- 
sent some operator operating on the elements of S such that the following conditions 
(a), (b), (c) are satisfied: 

(a) AO = O, where O is a null vector. 

(b) A(cX) = cAX, c is a scalar and X e S. 

(c) A(X+Y)=AX+AY where X eS, Y eS. 


Then Y = AX, X eS is called a linear transformation. Of course this general definition 
also covers the case when X is an ordered n-tuple and A an m x n matrix. Let us see 
what are the general operators that we can include under this general definition of a 
linear transformation. 

Example 2.6.8. Consider a vector subspace S of all real polynomials in the real scalar 
variable 0. Then a typical vector in this subspace S, for example a polynomial of de- 
gree 3, will be of the form 


X-ag*a,0 * a9 + a0). 


Consider the operator A = 2 Then A operating on X, namely AX, will be to differen- 
tiate X with respect to 0. Show that A is a linear operator. 


Solution 2.6.8. 
d 2 
AX = dg "4i * 20,0 + 3030 € S; 
A(cX) - q Sx) = c(a, + 2a50 +3a30°) € S, 
where c is a constant, free of 0. If 
U = bo + b10 + b,0? + b30? €S 
then obviously 


A(U +X) = AU + AX = (a, + b) + 2(a, + b5)0 + 3(a5 + b4)0? € S. 
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Hence A - E is a linear operator. What is the null space here? This means the set of 
all X satisfying the condition 


AX=0 > E. O = X= constant. 


The null space or the right null space is the set of all constants here. 


Example 2.6.9. Consider the same vector subspace S of Example 2.6.8. Let A be the 
integration operator, lj Show that A is a linear operator. 


Solution 2.6.9. Let X and U be as defined in Example 2.6.8. Then 


AX = | lao +40 + a, + a40?]d0 
2 3 4 


=a +a; — +a,— +43 — +C 
2 3 4 


where c, is a constant. Note that A(cX) = cAX here as well as 
A(X + U) = AX +AU. 


Hence A - h is a linear operator. AX =O => IP = O. This has no solution here and 
hence the null space for this vector subspace with respect to the operator A = h is 
empty. 


Example 2.6.10. Let S be the vector subspace of all real polynomials in the real scalar 
variable t. Let A be the operator AX = X?. (A operating on an element gives the square 
of that element.) Is A linear? 


Solution 2.6.10. Two typical vectors in this S, for example of degree 1 each, are of the 
form U =a, +a,t and V = by + b,t. Then 
AU = (ag + at - a2 * 2a9a,t + ait? 
AV = (bg + bt) = bà + 2bob t + bi? 
A(U + V) = A[(ag + bg) + (a, + bj)t] = [(ag + bo) + (a, + bt]? 
+ (ag & a,tY. + (bg + bt = AU + AV. 


Hence this operator A is not linear. 


Exercises 2.6 


2.6.1. Construct the matrix G, product of 4 x 4 elementary matrices, so that GA effects 
the permutation of the rows to the order 2,3, 4,1 where A is a general 4 x 4 matrix. 
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2.6.2. Write the following matrices in the form QDQ’ where Q is a nonsingular matrix 
and D is a diagonal matrix: 


1 1 1 1 
102 are 2: TL S21 
A=!]0 1 1], B= jd C2|1 3 2 
2 1 4 -1 2 -1 

1 1 1 1 


2.6.3. Write the following matrices in the form PDQ where P and Q are nonsingular 
matrices and D is a diagonal matrix: 


2 1 -il 3 1 -1 -2 2 =] 
A-|1 1 2|, B=|2 O 2], C=|0 -1 O 
-10 4 2.1 4 2 1 5 


2.6.4. Write the matrices in Exercise 2.6.2 in the form (a) DQ, (b) PD where P and Q 
are nonsingular matrices and D is a diagonal matrix. 


2.6.5. Reduce the following quadratic forms to their canonical forms: 
(a) u, 2 2x2 X3 + 33x; + X5 - 4X, 
(b) uy = 3x2 + x3 - 2x4X3 + 2X1X3 - 2x3X3 + 2x3. 


2.6.6. Construct two orthonormal bases each for the row subspaces of the matrices in 
Exercise 2.6.3. 


2.6.7. Using the general definition in Section 2.6.7 are the following sets vector sub- 

spaces? 

(a) The set of all n x n lower triangular matrices for a fixed n. If so, what is its dimen- 
sion? 

(b) The set of all n x n diagonal matrices for a fixed n? 

(c) The set of all integers, including zero. 

(d) The set of all possible scalar functions f(x) of the real scalar variable x defined on 
the interval [3,5]. 

(e) All couplets of real numbers (a, b), a 2 0, b 2 0. 


2.6.8. Taking the general definition of linear transformation in Section 2.6.8 are the 

following linear transformations? 

(a) Let S be a vector subspace of polynomials in t of degree x 3. For V e S let AV = 
(1 £)V. 

(b) Let S be the same subspace in (a) above. Let A be such that AV = V 4 5. 

(c) Let S be the subspace of 3 x 3 matrices. For V € S let AV = V + J, where J, is the 
identity matrix. 


2.6.9. IfA+B=I,, A2 A', B= B', AB = O, where A and B are n xn matrices, then show 
that A and B are idempotent with ranks r and n - r respectively. 
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2.6.10. If A+B=I,, A- A', B- B', AB- Othen show that rank of A plus rank of B is 
n, where A and B are n x n matrices. 


2.6.11. If A+ B=I,, A- A', B- B', where A and B are n x n matrices, then show that 
both A and B can bereduced to diagonal forms by the same orthonormal matrix, say P. 


2.6.12. Show that every nonsingular matrix can be written as a product of the basic 
elementary matrices. 


2.6.13. Write the matrices in Exercise 2.6.3 into the forms (a) DQ, (b) PD where D isa 
diagonal matrix and P and Q are nonsingular matrices. 


2.6.14. Construct one orthonormal basis each for the right null spaces of the matrices 
in Exercise 2.6.3, if possible. 


2.6.15. Construct two examples each for the following: (a) sum of two nonsingular 
matrices is singular; (b) sum of two nonsingular matrices is nonsingular; (c) sum of 
two singular matrices is singular; (d) sum of two singular matrices is nonsingular. 


2.6.16. (a) Can the product of two nonsingular matrices be singular? (b) Can the prod- 
uct of two singular matrices be nonsingular? Prove your assertions. 


2.6.17. Let A and B be rectangular matrices where AB is defined. (a) If A and B are of 
full ranks can AB be of less than full rank? (b) If A and B are of less than full ranks 
each can AB be of full rank? (c) If A or B is of full rank and the other is of less than full 
rank can AB be of full rank? Prove your assertions. 


2.6.18. Generalized inverse of a matrix. Consider an m x n matrix A of any rank. 
A generalized inverse (there can be many such inverses for a given matrix) or g-inverse 
of A is an n x m matrix, denoted by A^, such that X = A b is a solution of the consis- 
tent system of linear equations AX = b or equivalently A” is a g-inverse iff AA A =A. 
Evaluate a g-inverse of A where 
m f 1 2| l 
10 1 


2.6.19. Let A" be a g-inverse of A and let H = A A. Show that (a) H is idempotent, 
(b) AH =A, (c) rank of A = rank of H = trace of H. 


2.6.20. Let AX = b be a consistent system of linear equations. Show that a general 
solution of this system can be written as 


A^b «(H- DZ 


where A is a g-inverse of A, H =A A and Z is arbitrary. 


2.6.21. Consider the space S of all n x n matrices. Construct three subspaces S4, S2, S3 
of this space S. 


160 —— 2 Matrices 


2.6.22. Consider the vector space S of 3 x 3 matrices. Let 


110 000 000 
A,=|0 1 O|, A;-|O0 0 1], A,;=]0 0 O 
000 0 0 1 010 


Are A,,45, A5, as elements in the vector space S, linearly independent? Prove your as- 
sertion. 


2.6.23. Consider the vector space S of all polynomials of degree less than or equal to 
2 in the real variable t. Let 


p(t) =1+2t, p,(t)=t?, py(t 234 5t 2C. 
Are these linearly independent? Prove your assertion. 


2.6.24. Show that any matrix of rank r can be written as a sum of r matrices of rank 
one each. 


2.6.25. Let S, and S, be nontrivial subspaces of a vector space S. Show that S, and S, 
are orthogonal to each other if and only if each basis element of S, is orthogonal to all 
basis elements of S, or vice versa. 


2.6.26. Let S be the space of all polynomials in x of degree not exceeding n, for a 
fixed n, with the inner product 


1 
| Po)qG0de, pix) eS, q(x) es 


defined. Show that (Cauchy-Schwartz inequality) 


< NL : |. roa). 


2.6.27. Let S be the same vector space of polynomials in Exercise 2.6.26. Observe 
that 1, x, x’, ...,X" is a basis of S. Obtain an orthonormal basis by applying the Gram- 
Schmidt orthogonalization process. 


1 
| pooq God 


2.6.28. Show that the orthonormal basis obtained in Exercise 2.6.27 can be written in 
terms of the Legendre polynomials 
ak 


= 1 2 k 
Lx) = 2p qc -1) . 


2.7 Partitioning of matrices 


This is a very convenient way of doing matrix multiplications and computations of 
inverses when we have large matrices [number of rows or columns or both large]. The 
ideas will be introduced by looking at some special cases. 


2.7 Partitioning of matrices —— 161 


2.7.1 Partitioning and products 


Consider the multiplication involving two vectors 


3 
(1,-1,2)} 4 | =[(1)(3) + (-1)(4)] + [(2)(-5)] 
-5 


= [-1] + [-10] = -11. 
Let 


3 
A, = (1, -1), A> = (2), B, = H > B; - (-5). 
Then the above multiplication can be written as 
B, 
(A,,A5) B = AB, + A-B», 
2 
A,B, = (1)(3) + (-1)(4) = -1, 


A,B, = (2)(-5) xd —10. 


Here the multiplication is carried out as if A,,A,,B,,B, are scalars, but keeping the 
order in which the submatrices occur. This means that instead of taking in the order 
A,B, we are not allowed to take in the order B,A, when writing down the products. In 
general, let a and b ben x 1 vectors. Let them be partitioned as follows: 


a- l4. b- l| = a! = (A1,A)), 


Ay B; 

a bi a, bi 
a=|: |, b=]: |], Ai2| i], B=]: 

ay Dn a, b, 


Then 
a'b = (ab, +--+ + ajb,) + (aj 4b, rab.) 
= ALB, + ASB». 
This is possible as long as all the products are defined. If A, isr x1 and B, issx1,r#s 


then such a partitioning will not produce a simplified form, A,B, is not defined here. 
Consider the product 


3 1 1 
4 0 1 B 
(1,-1, | 2) = (A4, A2) | 2| ; 
2 
-5 —2 1 


3 1 1 


A, = (1,-1), A, - (2), a-[; 0 1 


| , B,-(-5,-2,1). 


162 —— 2 Matrices 


Let us multiply A;'s and B;'s as if they are scalars but keeping the order. Then 


B 
(4,45) C] = A,B; + A,B, 
2 


1 
1 


= (-1,1, 0) + (—10, —4, 2) = (-11, -3, 2). 


= (1,-1) f | + (2)(—5, 22,1) 


We get the same final answer if they are multiplied element-wise or directly. Consider 
the partition 


3 1 
1 -1 2 4 0 B 
| =(A,,A,){ .! 
O 1 |1 B; 
-b -2 1 
= A,B; + A,B, 
where 
Aes b Ae N a P: 15 A. 
O 1 1 4 0 1 
B, = (-5, 2,1). 
Then 
Ados set RM NUN 
O 1 4 0 1 4 0 1 
2 -10 -4 2 
A,B, = —5,-2,1] = 
By BI | lie - i 
and 


-1 1 0 -10 -4 2 
A,B, + A,B, = | | sk | A | 


4 0 1 -5 -2 1 
[=u -3 2 
(dA -2 2| 

If we multiply element-wise we get the same answer. 


Definition 2.7.1 (Conformal partitioning of matrices). Two matrices A and B are said 
to be partitioned conformally for the product AB, when A and B are partitioned into 
submatrices and if the multiplication AB is carried out treating the submatrices as if 
they are scalars, but keeping the order, and when all products and sums of submatri- 
ces involved are defined. 
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Example 2.7.1. Check whether the following partitioning is conformal for the product 
AB: 


1 0 |2 
@)  Ae|-31 1 | 1|5(4545. 
y uq 


By 
1 d 0 1 
1 | 0 2 
(b) A=]-1 | 1 1ļ|=(4,4), 
1 | 11 


Solution 2.7.1. (a) If the submatrices are treated as scalars and if the multiplication is 
carried out we get 


AB = A;B; + A-B, 


where 
1 1 1 1 2 
A= -1 > B =| | 
O O 1 1 
1 
2 
A,=]1], B,=(1,1,0,1). 
1 


A,B,, A,B, and A,B, + A,B, are all defined and hence the partitioning is conformal for 
the product AB. 
(b) Here 


1 1 -12 
xw A-[. 0 1 ; 
1 


but A,B, is not defined and hence the partition is not conformal. If the partition was 
after the first row that is, if B, = (1,1, 21, 2) then the partitioning would be conformal 
for the product AB. 
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2.7.2 Partitioning of quadratic forms 


In statistical theory, regression analysis, econometrics, model building and in many 
other areas one requires to study a part of a quadratic form or to study different parts 
separately. This leads to partitioning of a quadratic form. Let X bean n x 1 vector of 
real variables and let A be an n x n symmetric matrix of constants. We have already 
seen that a quadratic form can be written in the form 


u-X'AX, A-A'. (2.71) 
Consider the partitions 
x x 
X 1 r+1 
X = ( ‘) > X = : > X, and : H 
X, 
Xx, Xn 
A- D. P 
Ay A» 


where A4, isr xr, Ay is (n- r) x (n- r), Ay is rx (n- r) and Aj is (n-r) xr. When A = A! 
we have A}, = A,. Then in the partitioned form the quadratic form is the following: 


A A X 
u= X! x! 11 12 1 . 
l : 2) fs An X, 


Itis easy to note that the partitioning is conformal to carry out all the multiplications. 
Treating the submatrices as if they are scalars and completing the multiplications we 
have 


X 
u= (Xi An + XA; XA) + X245) (&) 
2 
= XA X, + X2A4X, +X AX + X2A5X,. (2.7.2) 


We obtain two quadratic forms X] A,X, and X245,X, and two bilinear forms XA5,X, 
and X] A,X. When A}, = A; we have an interesting property. 


QGA3X;) 5 X}A},(X5)' = Xi An% 


which is the same as the other bilinear form. Further Xj AX, and X}A,,X, are 1x1 
matrices or scalars and hence they are equal. 


(i) If P and Q are 1 x 1 matrices and if P’ = Q then P= Q. 


Then, when A = A’, 


u = X! AnX; + 2X! AX; XA. (2.7.3) 
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We can study the quadratic forms involving the subvectors X, and X, as well as the 
bilinear form involving X, and X, by using the representation in (2.7.3). As a numerical 
example, consider 


EDI ERUIT 
U -2x, —- Xj * X$ + 2X4xX» — 4X2X3 


2 1 | O]f[x 

-1 | -2||x 

= (83,35, | X3) 
o -2 | I4JLx; 


Consider the partitioning 
X'=(X X) XP = 0829) Xp =X 
and the corresponding conformal partitioning of A. Then 


U=XjAyX, + X2A5 X, + 2X1 AX, 


= [2x + 2x; - x5] Hm [x3] + 2[-2x5x3]. 
Note that 


2X 9X3 = X] AX; = X5AyX). 


2.7.3 Partitioning of bilinear forms 


A bilinear form in the vectors X and Y is a homogeneous linear form in X as well as in 
Y. Let X bepx1and Y be q x 1. Let A bea p x q matrix of constants. Then a bilinear 
form can be written in the form 


w-X'AY. (2.7.4) 


Bilinear forms have applications in studying covariances and correlations, in analysis 
of covariance techniques in design of experiments and in many related areas. Some 
theoretical aspects of bilinear forms in random variables may be seen from the book 
Bilinear Forms and Zonal Polynomials [8]. If a study of bilinear forms involving some 
subvectors of X and Y is undertaken then we need to partition the bilinear forms. Let 


«() OC 2) 
X; Y) An An 

where X; isp x1, X isp;x1,pi*pP; =p, Yi is qı x1, Y2 is q2 x1, q1 4 2 =, An İS pı x pi 
and A» is q x q;. Then Ajp is p4 x q5, Az is p; x q; and the partitioning is conformal. 
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Then 


w-X'AY 


= (xi xp alee 
Ay A» Y, 


=X; AnY; + XA» Y; 
t XLASY; + X244 Yi. (2.7.5) 
Note that here A}, # Az. In fact the whole matrix A is rectangular if p + q. In (2.75) we 
get two bilinear forms involving (X,, Y,) and (X,, Y,) and two bilinear forms involving 


(X,, Y2) and (X5, Y). As a numerical example, let us consider the following bilinear 
form in X' = (4,35,x3) and Y' = (yy, y2): 


W —Xiy4 — X5y1 * X3y4 + 2X1Y2 + X5y» + X3V2 


1 | 2 
cat, te aE yı 
= (x, (XX -—— 
(a Ie), a qoo 
1 | 1 y? 


Xi =X, Xj-(063. Yi-yp Yo-y» 


-1 1 
Ay=1, Aņ=2, An=(7'}. Ay=(1). 


Xi An Yi = X1(Dyi =X, 


Note that 


1 
X35 An Vy = (X,X3) ()» = Xy? + X3y», 
X1 Ap Y, = x4 (Dy = 2Xyyo. 


-1 
XjAxY; = ax ( 1 )n = —X2y1 + X3yi. 


2.7.4 Inverses of partitioned matrices 


Let A bea nonsingular nxn matrix so that its regular inverse A ! exists. Let us partition 
A and 4^! as follows: 


A= Ay Ay A! = A" A? 
2 A? A? d 
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A convenient standard notation of the submatrices of the inverse A^! is used here by 
writing the corresponding superscripts. Let us investigate the relationships among the 
submatrices. Suppose that the partitioning is conformal to take the product AA !. Sup- 
pose A,, and A” arer xr so that the remaining are automatically defined. Let the iden- 
tity matrix I„ be partitioned as 


I O 
Pos |r : 
É [o E 


Then writing the equations AA“! = I we have 


AA =I1> 
ls i bs 2 ` F 0 | EN 
An Any A” A? 0 Ina 

(D AyAM +A pA =I, 

(2) AyA?+A,A?=0 

G) AyA™ «A547 = 0 

(A) Ay AP ASAP ST. 
Premultiplying (2) by Aj}, if A; is nonsingular, yields 

A? ET af AA”. 
Substitution in (4) yields 
(-Ay Ai Ap * A5)A? =I = 
A” = (A5; -AnA Ap) 


From symmetry it follows that, assuming that the inverses exist, 


(A!) = Ai - ApAzjAg (2.7.6) 
(42) = A - AgAi]Ai (2.77) 
(Au) =A" - AP (A?) 747 (2.78) 
(Ay) | = A? - A (AH) 149. (2.79) 


Similarly from (1), (2), (3), (4) we can solve for A”, A”, Ap, A>, in terms of the other sub- 
matrices. The results in equations (2.7.6) to (2.79) are widely applicable in various types 
of problems. 


Example 2.7.2. Write z = Xj Aj X, + 2X] A,X, as the sum of two quadratic forms where 
one of them contains X, and X, and the other contains only X,, assuming A,, = Aj, and 
nonsingular. 
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Solution 2.7.2. This is similar to completing the square when we have a term contain- 
ing the square of a scalar variable and a second term which is linear in the variable. 
In order to see that the result can be achieved let us open up the following quadratic 
form, where C is a vector such that X, + C is defined: 


QG + CAqQOGQ + C) = X} AX; + 2X1AqC + C'A; C. 


Comparing this with z, that is, comparing the quadratic and linear terms in X,, we see 
that C corresponds to the following: 


AyC -ApX, > C = AG PAX: 
Hence the quantity to be added and subtracted is 
C'AyC = (A31A5X5) An (Aq AX) 
= XApAg AAT ApX 
= Xj AA AX). 
Therefore 
Z=X{Ay)X) + 2X] An% 
= (X,+C)'A,(X, + C) - XSAlLA TAX), 
C - Ai Apă. 
Example 2.7.3 (Gaussian density). In multivariate statistical analysis the most promi- 


nent density is the Gaussian density. If X is a real p x 1 vector random variable then X 
is said to have a Gaussian density if the density of X is of the form 


F(X) = cec icm v1 ac) 


where j is a constant vector, V = V', V is such that the exponent of e remains negative 
for all values of X and pu, and c isa Honnalizing constant such that the integral of f 
over X will be unity. If X is partitioned as X = p» ) where X; is r x1, r < p evaluate the 
density of X, (marginal density of X, is available from f (X) by integrating out X, the 
remaining variables). 


Solution 2.7.3. Consider the corresponding partitioning of u and V, that is, 
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where pi, is r x 1and V4 isr xr. For convenience let Y, = X, - 1, and Y, = X, - py. Then 
" yu ye Y, 
(X -W V (X - py) = [Yr Y;] [ya l H 
=Y/V"y, « 2Y] VPY, + Y V? Y,. 


Writing the linear term and the quadratic term in Y, with the help of the result in Ex- 
ample 2.7.2 we have 


2Y! V? Y, + YJV? Y, = (Y, + C! V? (Y, + C) 
= Wi y?(y2) yy. 
Cay tty 


Then 
(X - p)' V(X - p) = YL[V? - vP(y2) "Vy, 
*(54 C V? (Y, + C). 
Then from (2.7.7) 
y! p" yay?) y^ = Va 
Therefore 


(X - py! V (X - p) = (X; - i) V GG - 14) 
4(Y,4.0)'VA(Y, +C), Y» =X- 


We want to integrate out X, from f (X). That is, denoting in ()dX, as the integral over 
2 
X, 


| f(X)dX, = e| e- 30-0" V Wax, 
X, X, 


=c 1 QG-14)! Vg (X-u) 


1 1y22 
-5(Y;* C) V^(Y54C 
x[ eie (rox. 
Xx, 


But 
dX, = d(X, - yy) = dY, = d(Y, + C) 


because all other quantities are fixed as far as the integral over X, is concerned. Then 
the integral produces only a constant so that the normalizing constant c changes to 
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another normalizing constant c}. Then the marginal density of X;, denoted by f, (X4), 
is given by 


fiX) = ce 5m) Vi (Xm) ; 


exactly of the same form as f (X) with p replaced by r and with the corresponding 
changes in p and V. Thus all subsets of the p x 1 vector X have the densities belonging 
to the same family as f (X) since the exponent is a symmetric function in the compo- 
nents of X — u and since it is proved above that a subvector has the same type of Gaus- 
sian density. If f,(X,), which is available from f,(X,) with the corresponding changes 
or from f (X) directly, denotes the marginal density of X, then the conditional den- 
sity of X, given X, (given X, means X, is assumed to be a constant vector) is given by 
f (X)/f,(X,). Show that this conditional density also belongs to the same family as f (X). 
(Exercise for the student.) 


Before concluding this section a few more applications of matrices will be pointed 
out. More will be considered after introducing the notion of determinants in the next 
chapter. 


2.7.5 Regression analysis 


In statistics, econometrics and other areas, prediction of a variable by observing other 
variables or at preassigned values of other variables is an important activity. For ex- 
ample the world market price for wheat on the next first of January, say y, is a function 
of many variables such as x, = the current Canadian stock of wheat, x; = the USA stock 
of wheat, x3 = the Australian stock of wheat, x, = the drought situation in a wheat buy- 
ing country and so on. If y is to be predicted at preassigned values of x,, X2, ... xj, (say 
k real variables) then y = f(x,,...,x;,), some scalar function of x,,...,x,. If we assume 
f to be linear then the prediction function is 


Y = do + 04X; + t aX, (2.7.10) 


where x, ...,x, can be preassigned but a4, ...,a;, are unknown. The above model is 
called a linear regression model because the model assumes that the expected value 
of y at preassigned values of x,, ... , x; (“preassigned” means that if x, = 1 million tons, 
if x, = 3 million tons, etc. what will be y, y = the price per bushel on next January first) 
is of the form in (2.710). [Regression of y on x,,...,x; is the conditional expectation of 
y at given values of x,,..., x; because it can be proved that this conditional expecta- 
tion is the best predictor of y, best in the minimum mean square sense. For evaluating 
this regression function we need the conditional distribution of y at given values of 
Xi» ..., Xy. In the absence of conditional distribution, we will assume that this best pre- 
dictor is of a certain form, such as a linear function.] We have the data points from 
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previous years’ readings on y as well as on X,...,x;,. If the assumed model is an exact 
mathematical relationship then for every data point (y;, x;;, ... Xj) the equation (2.7.10) 
is satisfied. This is not the reality. The model is simply assumed to hold. There may or 
may not be such a relationship. Hence if e; denotes the error in y; for using the model 
in (2.710) then 


€j =yj- [ao t apxaj t tax], job... n (2.711) 


if there are n data points. Since k + 1 parameters dp, ...,a; are to be estimated n has to 
be at least k +1. Let, forn > k +1, 


€1 yı do 
€- : > Y= : > p = : > 

En Yn ay 
1 Xy Xia 
x-|1 Xi Xie 
1 Xin Xkn 

Then 

Y=XB+e 


and the error sum of squares is then 
e'e =(Y - XB'(Y - XB) (2.712) 


where X is a known matrix, Y is a known set of observations, f is the only unknown 
vector. The maximum value of e'e for arbitrary B is at «co, being non-negative. If fj 
is estimated by minimizing (2.712) the method is called the method of least squares. 
Recall from Chapter 1 the partial differential operator, which in the present situation 
is, 


a dao 
op \ 5 
Oa, 


and the effects of operating on a quadratic form and linear form are already considered 
earlier. Thus 


ð Py 2 5)Y!(v _ i 
og A 2X'(Y - XB) 2 0 
=> X'Xp- X'Y. (2.7.13) 
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Since xj's are preselected, without loss of generality we can assume X 'X to be non- 
singular (when data collected from the field are used sometimes X' X can be singular 
or nearly singular). If X'X is nonsingular then, denoting the estimated value of f by 


B, we have 
B-(x'x)x'Y. (2.714) 


The following properties are easy to establish (left as exercises to the student): 


(ii) The least square minimum S^, that is the right side of (2.7.12) when £ is replaced 
by B, is a quadratic form of the type 


VA Ss qe 0509 NOY = Y' [I - B]Y. 


(iii) I - Bis idempotent of rank n - (k +1), B is idempotent of rank k + 1 and further, 
I - Band B are orthogonal to each other. Also 


Y'Y = Y'[I - B]Y + Y' BY. 


When I - B and B are orthogonal to each other and when e has a standard multivari- 
ate Gaussian distribution it can be proved that Y'[I — B]Y and Y'BY are statistically 
independently distributed. Comparison of the sum of squares due to f, namely Y'BY, 
with the residual sum of squares, namely Y’ [I — B]Y, is the basis in regression anal- 
ysis and in a large variety of statistical inference problems, such as testing statistical 
hypotheses on f or on the individual components of f. 


2.7.6 Design of experiments 


Another prominent area of applied statistics is the topics of design of experiments and 
analysis of variance. Suppose that 3 different methods of teaching (say comparison of 
instructors) are to be studied for their effectiveness. Suppose that 3 sets of students 
of, say 30 each, with exactly the same background are selected and subjected to the 
3 different methods, one set of 30 under method 1, another set of 30 under method 2 
and a third set of 30 under method 3. Designing aspect of the experiment is to control 
all possible known factors, other than the methods of teaching, which may contribute 
towards the grade of the student. Let y;; be the grade of the j-th student under method i. 
Here i = 1,2,3 and j = 1,2,...,30. In general, we may want to compare k methods (i = 
1,...,k) and under the i-th method there may be n; students (n, students under method 
1, n, students under method 2 and so on). Then a linear, additive, fixed effect model 
is the following: 


Yj =H +Q; + ejj (2.7.15) 
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where p is a general effect (the student would have got some grade if she/he had stud- 
ied on her/his own; no instructor or method was involved), a; the deviation from the 
general effect due to the i-th method of teaching, ej the random part (sum total con- 
tribution coming from all unknown factors. Remember that all known factors which 
may contribute towards y; are controlled by properly designing the experiment). Note 
that j,a,, ... a, are all unknown. y;;’s are observed. The final aim is to test statistical 
hypotheses on aj's such as the hypothesis that all the methods are equally effective 
(a, = - = ay). First step towards the analysis is to estimate y, a4, ..., a. We use the 
method of least squares. When p, a, ...,a; are all assumed to be unknown constants 
we minimize the error sum of squares for estimating the parameters. 


k n 
2 x - » Yo -p-ay. (2.716) 
ELA 


The following results can be easily established (left as exercises to the student): 


(iv) The least square estimate of a; is given by 


y; y ni k n 
a; = "E earn =) yi y=) 2i 
: ja 


i i=1j=1 


(v) The least square minimum, in this case, is given by 
2 2 : y ? 
Sie RSL Ss 
ï ii "i 


The problem described above is part of the analysis in a one-way classification model 
arising from a completely randomized design in the field of Design of Experiments. 


Exercises 2.7 


2.7.1. Let 
1 2 
3 0 -1 2 e 
A-|1 -1 0 1], B= ; 
1 1 
1 1 1 1 
2 1 
1 2 1 
-1 0 1 A A 0 
C= a 11 12 -A 3 | 
1 1 1 Ay Ay 1 -1 
2 1 1 


(a) Give all possible partitioning of B so that AB is defined. Evaluate AB by using the 
submatrices for each partition of B. 
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(b) Give all possible partitioning of C so that AC is defined. Evaluate AC in terms of 
the product of submatrices for each partition of C. 


2.7.2. By using equations (2.7.6) to (2.7.9) and the corresponding equations for A}, and 
A>, evaluate the inverses of the following matrices A, B and C, assuming the inverses 
exist, by partitioning into convenient submatrices: 


a b a 0 0 a, a a 
A = f à > B = 0 b Cc > C = bi b; b; 
0 d e O Cy G 


2.7.3. By partitioning the matrix A and looking at the leading submatrices answer the 
following question: If X' AX = 0 for all possible vectors X is A = 0? 


2.7.4. Variance, covariance, correlations. E = expected value is an operator in 
Statistics. Let X be a p x 1 vector of real scalar random variables. Then u = E(X) = the 
mean value of X, V = E((X — u)(X - p)’] is the covariance matrix of X. If V = (vj) then 
vi; is the variance of x;, the i-th component in X, and vj is the covariance between x; 


and x;. Then p; = T is the correlation between x; and x;. Let 


x 
X= P3 X} SU UG -> Xp)- 


Let V be partitioned correspondingly. That is, 


V= ( Vu n 
Va V» 
where v; is 1x 1. Let u = a' X;, a’ = (a5,...,a,) is a constant vector. This means that 
we are considering a linear function of X,. Show the following: (a) the variance of u is 
a' Va, (b) the covariance between x, and X, is Vj; = V3, (c) the maximum correlation 
possible between x, and u is 
VVZ Va _ 2 


= P1Q...p) 
Vu E 


which is called the square of the multiple correlation between x, and X,. 


2.7.5. Show that P23) > Pio where [^ ) is defined in Exercise 2.7.4. Generalize the re- 
sult to show that Pio... p) increases with p. 


2.7.6. If A is real symmetric then show that A can be written as 
L O O 


PAP'-|O -I, O 
o 0 o 
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where r + s is the rank of A and P is a nonsingular matrix. [The diagonal block O may 
be absent.] 


2.7.7. If Aisn xm, n>m such that A'A = I,, then show that there exists an n x (n — m) 
matrix B such that (A, B) is orthonormal. 


2.7.8. Kronecker product. Let A = (aj) be p x p and B = (bj) be q x q. Consider the 
pq x pq matrix A @B, 


auB annB ... aB 
AgB= anB aB .. ayB 
apb apB .. ayB 


Then A& B is known as the Kronecker product of A with B. 
(1) Evaluate the Kronecker product A ® B where 


2 5 
2 0 
A-|[3 0 -1], B=] | 
1 -1 
1 4 2 


(2) Compute B & A and compare with the result in (1). 


2.7.9. Let A be an m x n real matrix of rank r. Show that there exists a nonsingular 
m x m matrix B and an orthonormal matrix C such that 


I, O 
A=B|" 


2.7.10. Let A be an m xn real matrix, m > n. Show that there exists an m x m orthonor- 
mal matrix B such that 
T 
BA= 
(o) 


where T is an n xn upper triangular matrix and O is an (m - n) x n null matrix. 


2.7.11. Let X bean n x 1 vector of unit length. Show that I, - 2XX' is an orthonormal 
matrix. 


2.712. Let A bean m x n real matrix of rank r with m > n. Show that there exists an 
m x (m - n) semiorthonormal matrix Q, Q’Q =I andan nx n upper triangular matrix 
P such that 
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2.7.13. Consider the following diagonal block matrix D where A,B,C are nx n, mx m 
and r x r square matrices. If D is nonsingular then show that D^! = diag(A^!, B^, C71) 
that is, 


A OO A! o oO 
D=|O0 B O|, eheu. B! 30 
0 OC o 0 C! 


2.7.14. If A- [5 "] and A! = [G 5] where u,v,w are vectors and d and 6 are scalars 
(1x 1 matrices), then show that 


B!=C-w6é"'x', C=B'+6B!w’'B!, 
w=-6B"u, x'=-ôv' B. 


2.715. If A = [5 2] and nonsingular then show that 


I -BC 
Adz . 


Additional problems on vectors and matrices 


Use the 8 conditions listed after Example 2.6.7 as axioms to define a vector space, as- 
sume scalar multiplication and addition are as done in the case of matrices and as- 
sume that the scalars are real or complex numbers. Then establish the results in Exer- 
cises 2.1 to 2.4 below. 


24. Show that the following sets are vector spaces satisfying all the conditions as 

mentioned above. 

(i) The collection of vectors as n-tuples for a fixed n, as we have defined in Chapter 1 
where the elements are real or complex numbers; 

(ii) The collection of all polynomials in a real scalar variable of degree less than n, for 
a given n, with coefficients real or complex numbers; 

(iii) The collection of all real-valued functions of a real variable which are differen- 
tiable; 

(iv) The collection of all n x n matrices, with given n, where the elements are real or 
complex numbers. 


2.2. Construct one basis each for the vector spaces in Exercise 2.1 (1)- (iv). 


2.3. Let V be a collection of couplets of real and positive numbers, that is, V = 
((a, B) : a » 0, B > 0). Define addition and scalar multiplication as follows: 


(4, B3) + (a2, B5) = (A, a>; B3B) for every (a4, £1) and (a5, B5) in V. 
c(a, B) = (a^, ^) for every real number c and (a, B) in V. 


Show that V is a vector space over the field of real numbers. 


Additional problems on vectors and matrices —— 177 


2.4. Showthatthe concepts of linear dependence and linear independence of vectors, 
basis ofa vector space and dimensions of vector spaces as defined for ordered n-tuples 
of real numbers also go through for the general vector space defined above. [More 
on abstract vector spaces may be seen from more mathematically oriented books on 
vectors and matrices, see for example, [9] 


2.5. Find matrices A for which A'A = O, A + O. 
2.6. Show that 


is lel cosnÜ k sinn 


> k 0, E E 
-isin8 cos 0 -isinnó E i 


2.7. Kronecker product. The Kronecker product A & B is defined as A & B = (a,B). If 
A = (aj) is p x q and B = (bj) is m x n then Ae B is pm x qn. For example let 


AeB- | (-1)B 2a -3 7 3 -7 0 0 
(2B (3B (-2)B 


and 

4 -4 0 5 -5 (0) 

(4A (5)A 8 12 -8 10 15 -10 
BeA- - A e B. 
pu mal | 39:0. 4 x 9 

-6 -9 6 14 21 -14 

Write down Ae Band B&A if 
A= nen and B- 3 2 5 
-2 0 -3 4 


2.8. vec(X). Let X = (xj) bea p x q matrix. Let the j-th column of X be denoted by x. 
Consider the pq x 1 vector formed by appending Xa) X(z),....X(q) into a long column. 
This is defined as vec(X). That is, 


Xa 


vec(X) = XQ) 


Xg) 
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For example, let 


1 
2 
A=; p» | = vec(A) = = and [vec(A)]' = (1.2, -1,0,1,5). 
2 0 5 0 
1 
5 


Form vec(A) for the following matrices: 


10 -1 1 1 
(0 A=|]2 1 al, @ As|-31.00|, Gi) A=h. 
O 1 O 2 5 


2.9. ShowthatifAispxq,Xisqxrand Bisrxsthen the ps x1 vector 
vec(AXB) = (B' e A)vec(X). 


2.10. If Y = AX where X and Y are p x q matrices of functionally independent real 
variables, A is a p x p nonsingular matrix of constants and if (dY) and (dX) represent 
the matrices of differentials in Y and X respectively then show that 


vec(dY) - (I e A)vec(dX). 


2.11. If A,B,C,D are matrices for which the following sums and products are defined 
then establish the following results: 

(0 AeBeC-(AeB)eC-Ae(BeC) 

(i) (A-B)e(C-D)-(AeC)- AeSD- DeC BeD 

(iii) (Ae B(CeD) = ACe BD 

(iv) a8A-aA-Aea, aa scalar 

(v) (A@B)' 2A' e B' 

(vi) tr(A & B) = [tr(A)][tr(B)] 

(vii) (A8 B)! = A™! & B7! 

(viii) a! b = ba! - bea! 


where a and b are two column vectors, not necessarily of the same order. Also if A isa 
partitioned matrix where A11, A12 45,, 45, are submatrices and if B is any other matrix 
then show that 


A- [Au 42] , pepe |An 9B. An eB]. 
Ay Ay Ay @B Ay,@B 


2.12. For two matrices A and B show that A e B is nonsingular if and only if A and B 
are nonsingular. 
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2.13. For two matrices A and B show that vec(A) - vec(B) does not necessarily imply 
that A - B. 


244. Forany two column vectors a and b show that 
vec(a)=vec(a’) and vec(ab')=bea. 
2.15. For three matrices A, B, C for which the product ABC is defined show that 
vec(ABC) = (C' & A)vec(B) and vec(AB) = (B' @!)vec(A). 
246. If Aismxn, Bisnxrand ais r x 1 then show that 


vec(AB) = (B' @I,,)vec(A) = (B' e A)vec(I,,) = (I, 8 A)vec(B) 
ABa = (a! @ A)vec(B) = (A 8 a')vec(B!). 


2.17. If A,B,C are n x n matrices and further, if C = C’ then show that 
[vec(C)]' (A e B)vec(C) = [vec(C)]' (B e A)vec(C). 
2.18. For any m x n matrix A show that 
vec(A) = (I, 8 A)vec(I,) = (A' 8 Im )vec(,,). 


2.19. For any four matrices A, B, C, D let ABCD be defined such that ABCD is a square 
matrix. Then show that 


tr(ABCD) = [vec(D')]'(C' & A)vec(B) = [vec(D)]' (A e C' )vec(B’). 
2.20. If p(A) and p(B) denote the ranks of A and B respectively then show that 
p(A) p(B) = p(A e B). 


2.21. Let A be a square matrix of order n and let p(x) be a polynomial of degree k in 
the scalar variable x, that is 


pO) = ag * ax +- ayx*. 
Let p(x), q(x), h(x), t(x) be polynomials in x such that 
p(x)*q(x)-h(x) and p(x)q(x)-t(x). 
Define 
p(A) 2 agl * aA +- * a,A*. 


Then show that 


p(A)*q(A) -h(A) and p(A)q(A)- t(A). 
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2.22. If p(-) denotes the rank of (-) and if AB and A + B are defined then show that 
p(AB) xp(A), p(AB)<p(B), p(A +B) <p(A) + p(B). 


2.23. For the following matrix A take A*, k = 1,2, ... by using a computer. Then show 
that for all k > 16 the matrix A‘ remains the same (up to three decimal places) where 


08 02 04 
A-|01 07 03 
01 0.1 0.6 


2.24. Take a 2 x 2 singly stochastic matrix A with nonzero elements and with the col- 
umn sums equal to 1. Take powers of A by using a computer. Explain the behavior of 
AK for large k. 


2.25. Repeat the process in Exercise 2.24 for a 3 x 3 singly stochastic matrix A and 
explain the behavior for large k. 


3 Determinants 


3.0 Introduction 


A determinant is an explicit scalar function of the elements of a square matrix. Itis de- 
fined only for square matrices. A scalar function means a function which when eval- 
uated is a 1 x 1 matrix. If the elements are real or complex numbers then this func- 
tion will also be a real or complex number. A few scalar functions of a 2 x 2 matrix, 
A- x x ), denoted by u4, u5, u3, are the following: 

Uj =X? +X3 +X +X? =tr(AA'), 

Uy = X1X4 — X3X3; 

U3 = X +X, = tr(A). 
All the above functions are 1 x 1 matrices or scalars. Out of these three we will be inter- 
ested in a function of the type u,. Why are we interested in such a scalar function? It is 
mainly because of its applications in various fields. In Chapter 2 we have seen that ma- 
trices appear naturally in many areas. More examples from several other areas could 
also have been given in Chapter 2. In all such problems, where square matrices come 
in, determinants can enter automatically. 

We will define the determinant of an n x n matrix as a scalar function satisfying 


certain conditions. Let 24,05, ..., à, denote the n rows (or columns) of an n x n matrix 
A- (aj). Then, if they are rows, 


Qi = (Qi -ân t=1,...,n. 
That is, 


04 = (dq, A12 ..., 045) = first row, 


Q = (5,,055,...,05,) = second row, and so on 


3.1 Definition of the determinant of a square matrix 


Let f bea scalar function (not a vector function ora matrix function) of a4, ... , a, called 
the determinant of A, satisfying the following conditions: 


f (04, ..., cai, ..., An) = cf (as, ... Aj, ... 0), (a) 


where c is a scalar. This condition means that if any row (column) is multiplied by a 
scalar then it is equivalent to multiplying the whole determinant by that scalar. This 
scalar quantity can also be zero. 


f ty ittis oco + Ajs An) RU eg Mavis Ajs xs An). (B) 


@ Open Access. © 2017 Arak M. Mathai, Hans J. Haubold, published by De Gruyter. CABAR] This work is licensed 
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This condition says that if the i-th row (column) is added to the j-th row (column) the 
value of the determinant remains the same. The i-th row (column) remains the same 
but the new j-th row is aj + a5, that is, the original j-th row (column) plus the original 
i-th row (column). Combination of conditions (a) and (8) shows that the value of the 
determinant remains the same if a constant multiple of one row (column) is added to 
another row (column). 

Let a; be written as a sum of two vectors, a; = f; + y;. Then the next condition is 
that 


f (a4, .... Bi + yp ..., Oy) 2 f(a5,.... Bj ..., 04) +f ... yp. 0). (y) 


This means that if the i-th row (column) is split as the sum of two vectors, f;  y;, then 
the determinant becomes sum of two determinants where in one the i-th row (column) 
is replaced by fj; and in the other the i-th row (column) is replaced by y;. 


f(e... en) 21 (8) 


where e, ..., e, are the basic unit vectors. This condition says that the determinant of 
an identity matrix is 1. 

The above conditions can be called the postulates or axioms to define the deter- 
minant of a square matrix. The standard notations used to denote the determinant of 
asquare matrix A are the following: 


|A], det(A) = determinant of A. 


In the first notation above, A is enclosed by vertical bars. The matrix was enclosed by 
ordinary or square brackets. 


a-|é a= matrix A |A| = 
c d 


e d = determinant of A. 
c d 


Let us evaluate the determinant of this 2 x 2 matrix A by using the postulates above. 
Let a + 0. Then by postulate (a), 


_ |1 b/a 
il=al? d 


We have divided the first row by a and in order to keep the value of the determinant 
the same we kept a outside also because dividing any row by a # 0 is equivalent to 
dividing the whole determinant by a. In other words we have taken a outside from the 
first row. Now add (-c) times the first row of the determinant on the right to the second 
row. By postulates (a) and (f), the value of the determinant remains the same. Then 


1 


|A| =a $ =a (postulates (a), (B)) 


Oo d 


cb 
a 
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b 
3 a(d = L) l : (postulate (a)) 
= a(d E E) f | (postulates (a), ()) 
a a(d 2 2) (postulate (6)) 
a 


| 
Q 
a 

l 
> 
o 


If a= 0 then by adding other rows or columns one can bring a nonzero element at the 
(1,1)-th position without altering the value of the determinant, unless all elements 
on the first row or first column are zeros. In such a case the determinant is zero by 
postulate (a) itself. Thus, in general, we have the following result: 
(i) The determinant of a 2 x 2 matrix A = [44] is |A| = ad - bc. 
For example, 
2 3 
A= K ; = |A| = 206) - G)-D = 13; 


B- l j > IBI = Q5) - (0)(0) = 10; 


C= l | > |C| = (2)(5) - (3)(0) = 10. 


[e] 
ui 


3.1.1 Some general properties 
A few properties follow immediately from the definition itself. 


(ii) The determinant of a square null matrix is zero. The determinant of a square 
matrix with one or more rows or columns null is zero. 

(iii) The determinant of a diagonal matrix is the product of the diagonal elements. 
[This means that if any diagonal element in a diagonal matrix is zero then the de- 
terminant is zero.] 

(iv) The determinant of a triangular (upper or lower) matrix is the product of the 
diagonal elements. [This means that if any diagonal element in a triangular matrix 
is zero then its determinant is zero.] 


For example, 
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O a, a3 

A=|0 b, b| = |Al=0; 
0 G C3 
d 0 0 

D=|0 d, 0| => ID| - didds; 
0 0 d 
a a Q3 

T= 0 b, b; > ITI = a,b 93. 
0 0 G 


In A above, it is equivalent to multiplying the first column by the scalar c = 0 and 
hence by postulate (a) the determinant of A is zero. In the triangular case T above, by 
repeated application of postulate (a) and (f) the elements at the (1, 2)-th, (1,3)-th and 
(2, 3)-th positions in T can be made zero without affecting the value of the determinant. 
Then applying (iii) above the result follows: 


(v) If any two rows (columns) are interchanged then the value of the determinant 
of the new matrix (obtained after the interchange) is (-1) times the value of the 
determinant of the original matrix. 


For example, 
1 0 -1 O -1 2 
A-|1 1 1 and A,;=]}1 1 1/5 
0 -1 2 1 0 -1 


|A;|=-|A| or |A|=-A)]. 


Here A, is obtained from A by interchanging the first and the third rows. Let 


MONET 
A|1 1 1|» lA4l--AI. 
-1 0 2 


Here A, is obtained by interchanging the first and second columns. 
Property (v) can be easily established. An outline of the proof is given here. Let 
05, ...,0, denote the rows (columns) of a matrix A. The i-th and j-th positions are in- 


dicated in the following sequences of steps and the postulate used is indicated at the 
right of each line: 


|A| =f (Oy, ... Oj ... , 05, ..., An) 
z f(a,,..., 05 ..., Qj Q5 ..., 04) (B) 
j 


=f (a, ..., Qis ..., Oy — Ajs ... Un) (a) 


3.1 Definition of the determinant of a square matrix —— 185 


= -f Quo Op ng oda) (B) 
=f (Qis... js ..., Oy — Ajs ..., An) (a) 
=f (Oy, ... Oj ..., - Qj... Mn) (B) 
= -f (Ajs ..., a5, ... , Qt ... , An). (a) 


In the above steps the i-th row (column) a; and the j-th row (column) a; are shown 
to indicate the changes in these rows (columns) at the i-th and the j-th positions. For 
example, step 2 above says that the i-th row (column) is added to the j-th row (column) 
and the value of the determinant remaining the same. 


Example 3.1.1. Evaluate the determinant of the following matrix: 


1 1 -1 2 
A= Oo -1 3 0 
2 1 -2 4 
5 0 -1 1 


Solution 3.1.1. We add (-2) times the first row to the third row, the value of the de- 
terminant remains the same according to postulate (8). We will indicate the steps by 
using our standard notations of Chapters 1 and 2. 


1 1 -1 2 1 1 -1 2 
T Oo -1 3 0| jO -1 3 O0 

2 -2 4 Oo -1 0 0 

5 0 -1 1 5 0 -1 1 


Now, starting with the determinant on the right above we add (—5) times the first row 
to the last row, the value of the determinant remains the same according to postulate 
(B). That is, 


p up e» 
0 -1 3 0 

-5(1)4 (4) > |Al= 
(1)+ (4) = IAI — À 
05 4 -9 


Now we start with the second row and apply postulate (8) repeatedly. 


-1(2)+ 3); -5(2)+(4) = 
1 1 =f 2 


For the next step we take out (-3) from the third row by using postulate (a) and then 
add 11 times the third row to the fourth row. That is, 
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js do ope 
o- 3 0 
BIS eg n 9 
o 0 -11 -9 
1 1 1 2 
0o -1 3 
=(-3) 0 <6 = (-3)(1)(-1)(1)(-9) = -27. 
00 0 -9 


The last step above is obtained by multiplying the diagonal elements because the ma- 
trix in the determinant is triangular. The above operations were in fact elementary 
operations. Our aim was to reduce the matrix to a triangular form (upper or lower) 
so that we know that the determinant would be the product of the diagonal elements 
there. 


Example 3.1.2. Evaluate the determinant of the following matrix: 


0 2 1 5 
a 4 0 1 2 
-7 1-10 
5 0 1 2 


Solution 3.1.2. The element at the (1, 1)-th position is zero. We will bring a nonzero el- 
ement at the (1, 1)-th position by interchanging rows or columns. For the above A if the 
first and the third columns are interchanged then we can bring a convenient number 
at the (1,1)-th position. Remember that by this interchange the original determinant 
is multiplied by (-1): 


1 2 0 5 

1 0 4 2 
Aļ=- : 
Al -1 1 -7 0 

1 0 5 2 


Now we do the following operations on the rows, without altering the value of the 
determinant: 


-1(1) + (2); 0+6); -10) + (4) => 


1 2 0 5 

O -2 4 -3 
|A| = - 

03 -7 5 

0-25 -3 


By adding (-1) times the second row to the fourth row we can get rid off the number 
at the (4, 2)-th position. That is, 
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1 2 0 5 
O -2 4 -3 
[Al 
0 3 -7 5 
0.0 1 


In order to get rid off the element at the (3,2)-th position we can divide the second 
row by (—2) and then operate with the new second row. This brings in fractions and 
the computations become complicated. But fractions can be avoided by multiplying 
the second row by 3 and the third row by 2 and then operating with the second row. 
[When a row is multiplied by a nonzero scalar remember to keep its reciprocal outside 
to maintain the value of the determinant.] 


12 0 5 
1 |0 -6 12 -9 
~(23)|0 6 -14 10 
00 1 0 


|A| = 


The above line is the result of the operation [(2) + (3) >]. Now we can either inter- 
change the third and the fourth rows to avoid fractions or multiply the fourth row by 
2 and then add the third row to the fourth row. Using the latter we have 


1 2 0 5 
1|0 -6 12 -9 
|A| 2-— 
12210 0 -2 1 
0 0 2 0 
1 2 0 5 
| 1|o -6 2 -9 
| 42/0 0 -2 
0 0 0 


Now the matrix is brought to a triangular form and hence the determinant is the prod- 
uct of the diagonal elements. That is, 


1 
IA| po 6)(-2)(1) = -1. 


When evaluating a determinant the following are the steps to remember 


(1) Ifthe element at the (1, 1)-th position is 1 or the least common multiple of the ele- 
ments in the first column then start the operations. With the help of the first row 
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(2) 


G) 


(4) 


(5) 


make all elements in the first column, except the first one, zeros. Adding any mul- 
tiple of any row to any other row will not change the value ofthe determinant. The 
basic aim is to bring the matrix of the determinant to a triangular form. 

If the element at the (1, 1)-th position is not 1 or not the least common multiple 
of the elements in the first column then either by adding a row (column) to the 
first row (column) or by interchange bring a suitable number to the (1, 1)-th posi- 
tion. For each interchange keep one (-1) each outside because by one interchange 
of rows or columns the determinant is multiplied by (-1). If these operations do 
not bring a convenient number to the (1,1)-th position then multiply the rows 
(columns), as many of them as necessary, so that when suitable multiples of the 
resulting first row is added to the other rows fractions are avoided. When any row 
(column) is multiplied by a number c + 0 remember to keep I outside to maintain 
the value of the determinant. 

After reducing the elements in the first column, except the first one, to zeros start 
with the second row. Follow through the above steps with regard to the element at 
the (2, 2)-th position. Operate only with the rows and columns from the second on- 
ward. Otherwise the triangular nature of the final format will not be achieved. Try 
to make all elements on the second column, starting from the third element on- 
ward, to zeros. Repeat the same process with the third row, fourth row and so on. 
The value of the determinant is the product of the diagonal elements in the fi- 
nal triangular format, multiplied by the quantities which were kept outside to 
maintain the value of the determinant. 

For matrices of order n = 2,3, that is n x n matrices with n = 2 or n = 3, evaluate 
the determinants directly. The 2 x 2 case is already dealt with in property (i) and 
a mechanical procedure for the 3 x 3 case will be considered in the next section. 
For n 2 4 use the steps (1) to (4) given above. 


When every element of a matrix is multiplied by a scalar quantity c we say that the 
matrix is multiplied by c whereas when any one particular row or column of a matrix 
is multiplied by c its determinant is multiplied by c. Thus we have the following result: 


(vi) When an nxn matrix A is multiplied by the scalar c its determinant is multiplied 
by c". That is, 


|cA|=c"|A|, for example, | - A| = (-1)"|A|, |2A| = 2"|A|. 


(vii) The value of the determinant of a matrix of real numbers can be negative, pos- 
itive or zero. 


From our postulate (8) the value of the determinant remains the same if any multiple 
of any row(column) is added to any other row (column). Thus if one or more rows 
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(columns) are linearly dependent on other rows (columns) then these dependent rows 
(columns) can be made null by linear operations. Then the determinant is zero. Thus 
we have the following result: 


(viii) The determinant of a singular matrix (rows/columns are linearly dependent) 
is zero and that of a nonsingular matrix is nonzero or one can define singularity or 
nonsingularity of a matrix through this property of its determinant. 


Definition 3.1.1. A square matrix A is singular iff |A| = O and nonsingular iff |A| + 0. 


3.1.2 A mechanical way of evaluating a 3 x 3 determinant 


We can evaluate an n x n determinant, by adding suitable combinations of rows 
(columns) to other rows (columns) and using the basic properties of determinants 
as shown in the illustrative examples. If the steps are applied to a 3 x 3 determinant 
the final answer can be shown to be equivalent to the expression obtained by the 
following mechanical procedure. Let 


a a dg 
A=|b, b, b, 
€ C? C3 


Write down the first two columns as the fourth and the fifth columns. Then we have 
the configuration 


4 0; 034 Q ay 
b, b b, b, b 
€ €) G € C 


A, Ay d4 d4 a, a, 
NNN 4% 


Take the product of the leading diagonal elements starting with the (1, 1)-th, then with 
(1, 2-th, and then with the (1, 3)-th elements. We have three terms from this operation. 
Now we look at the second diagonal, the diagonal going from the bottom left corner 
and up. Take the product of the elements in the second diagonal starting with the 
(3,1)-th element, then with the (3,2)-th element and then with the (3,3)-th element. 
We have a second set of three elements. Multiply each element in the second set by 
(-1). The sum of these two sets of 6 elements is the value of the determinant. The 
operations are shown symbolically as above. The final answer is the following: 


a4, a) Q3 
IA|=|b, b, b4|-ab5€4 + a5bsc, + a3b,c, 
€ C? C3 


a3b5c, — a,b3C - a5b4C,. (1.1) 
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[The student may verify it by evaluating the determinant directly.] This mechanical 
procedure works only for the 3 x 3 case. 


Example 3.1.3. Evaluate the determinant of A by using the mechanical procedure in 
(3.1.1) and verify the result by evaluating it with the help of the postulates, where, 


1 2 4 
Az-z|-13 2 
-2 0 -1 


Solution 3.1.3. For getting the solution from the mechanical procedure we write the 
first two columns again and then from the leading and the second diagonals of the 
new configuration we obtain the final answer. The new configuration is the following: 


1 2 4 1 2 
-1 3 2 -1 3 > [DOG + 2072 + (4)-1)0)] 
-2 0 -1 -2 0 


- [4072 +0) + DDE] 
= [C3) + C8) + (0)] - [C24) + (0) + (2)] 
=11. 


In order to verify the result let us evaluate the determinant directly through elementary 
operations. [The operations are listed on the right of each line.] 


1 2 4 
|Al=]-1 3 2 
-2 0 -1 
1 2 4 
-lo 5 e| [()+(); xX)+6) >] 
0 4 7 
1 1 2 4 
=a 20 24| [4(2); 58); =] 
O0 20 35 
1 1 2 4 
=55|0 20 24 [-1(2) + (3) =] 
o oO 1 


1 
= 59 00) =11. 


Let us evaluate a 3x3 determinant of A = (a;) by using the postulates (y) and then 
(a) and (B). Let us open up the first row. 


(a11; 045, 43) = Ay, (1, 0,0) + a45(0, 1, 0) + a43 (0, 0, 1). 
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Then by postulate (y) 


4, AQ Q5 
IA|=]ay, an az 


a31 05) 03 


1 0 0 0 1 0 0 0 1 
-7045|03 An 3) + Ay2]}4q, An 405|t0;5/|0$ An 05|. 


a31 03 03 a31 05) 03 03 03) 433 


Consider the first matrix on the right. Using the first row by adding suitable multiple to 
the other rows (postulates (a) and (B)) we can wipe out all elements in the first column, 
except the first element which is 1. Similarly, by using the first row we can wipe out 
all elements in the second column, except the first one, in the second matrix on the 
right. We can wipe out all elements in the third column, except the first one, in the 
third matrix on the right. The result is the following: 


1 0 0 0 1 0 0 0 1 
l4|-ag|0 an 93] + AyQ}Ayq O93] + Ay3]Ay, Ay Ol. 
O dy 433 a3, 0 az a3, a3, O 


Then by transpositions (one transposition is one interchange of adjacent rows or 
columns which will result in one minus sign coming out of the determinant also) of 
the columns we can bring the second column in the second matrix on the right to 
the first column position and the third column in the third matrix on the right to the 
first column position. In the second matrix on the right we need one transposition 
and hence one minus sign will come out. In the third matrix on the right we need two 
transpositions and hence (-1)? = 1 will come out. The final result is the following: 


1 O 0 1 0 0 1 0 0 
lÀl-ag|0 an 45|-45|0 an 45 '*45|0 ay Ay]. (i) 
O 45 433 O ay 43 O 454 az 


Now, we will consider each determinant on the right. For example, consider the first 
one. Open up the second row by using postulate y. Then we have 


1 (0) (0) 1 (0) (0) 1 (0) 0 
O a 43/=ay|0 1 0|*a53|]0 0 1|. (ii) 
O dy dy O a3, az O d5 43 


Now by using the second row wipe out the second column elements in the second 
matrix and third column elements in the second matrix on the right and make one 
transposition in the third column to the second column position (natural order). This 
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will bring one minus sign there. That is, 


1.0 0 10 0 10 0 
O a 4à5|[745|0]1 0|-a4)01 O 
O az 43 0 0 az 0 0 ax 


Now take out the elements from the third row in both matrices and we have 


1 0 0 
au|0 d» a3} = Ay 79433 - 05053037. 
O d5 43 


When writing the elements keep the first subscripts in the natural order 1,2,3. Note 
that the second term is a4,a5,a5; and here the second subscripts are in the order 1,3, 2. 
One transposition is needed to bring 1,3,2 into the natural order 1,2,3 and hence the 
multiplicative factor outside is (-1)! = -1. Now open up the second and third terms 
in (ii) to get the balance of the terms in the mechanical procedure indicated earlier. 
This is how we obtained the terms in the mechanical procedure. Note that when it is 
a 3x3 determinant, when we open up, we have 3! - 6 terms. In each term, one and 
only one element will come from each row and each column of A. Write these 3! terms 
with the first subscripts in the natural order 1,2, 3. Now look at the second subscripts. 
Ifthe number of transpositions needed to bring this into the natural order 1,2,3 is odd 
then multiply by -1, if even then multiply by +1. The possible terms in our case are the 
following: 


441477433, 011053035; 

01205035; 012053031; 

01305035 013051035. 
In the first term the second subscripts are in the order (1,2,3). This is in the natural 
order. Hence the number of transpositions needed to bring this to the natural order is 
O and hence we multiply by (-1)? = +1. The term à,,45545; has the second subscripts in 
the order (1,3, 2). One transposition, namely the interchange of the second and third 


elements, will bring this to the natural order (1,2,3). Hence we multiply this term by 
(-1)! = -1. The various terms and the signs are as given below: 


Term Sign Finalterm 


440505; +1 0305033 


dj0503; -1 -4143432 
anana; -1 -ananz 
45053 +1 01505303; 

azana; -1 -413422431 


43053035; +1 0313051032 


3.1 Definition of the determinant of a square matrix —— 193 


Then 


|A| = [431422035 + 412433431 + 413471432] 


— [411433432 + 41247433 + 413472431] 
the same result obtained from the mechanical procedure in (3.1.1). 


By using the same procedure as used to open up a 3 x3 determinant, we can open 
up an n x n determinant |A| = |(a;j)|. There will be n! terms. In each of these n! terms 
there will be one and only one element coming from each row and each column of A. 
A typical element can be written in the following form: 

Aii Agi, t Ani, 
where i,,...,i, represent the column numbers and the first subscripts the row num- 
bers (taken in the natural order, 1,2, ..., n) then a term will be multiplied by (+1) if the 
number of transpositions needed (one transposition is one interchange of two adja- 
cent columns or adjacent second subscripts) to bring (i,, ...,i,,) into the natural order 
(1,2, ...,n) is even and if this number is odd then the term is multiplied by (-1). The 
final value of the determinant can be written as follows, which can also be used as a 
definition for the determinant: 


(ix) Let A = (aj) beannxn matrix. Then its determinant is given by 


|A| = `X D As Y (arr ays ay Zr (3.1.2) 
eae oe 


hob 


where p(i, ...,i,) stands for the number of transpositions needed to bring (i,,...,i,,) 
to the natural order (1,2, ... , n). 


We may note that when n is large then (3.1.2) is not that easy to compute. Even for n = 3 
itis found to be quite involved. Hence in a practical situation we will use the postulates 
mentioned in the beginning, along with the properties seen so far, to evaluate a deter- 
minant. Even though (3.1.2) is not the most efficient way of evaluating a determinant 
this representation has many theoretical uses. 

Let us see what happens if one row of a 2 x 2 matrix is split into two rows. Let 


A-|4 b| |a*a; b,+b, 
"qe d| | x " MAT 


Here the first row is written as a sum to two rows, namely, 


(a, b) = (a4, b4) + (a5, b). 
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Then the determinant 


|A| = ad -bc = (a4 + a5)d - (b, + b3)c 
= (ad - bic) + (ad — b,c) 


a, b 
c a 


a, b, 
c al’ 


The determinant has split into a sum of two determinants. Thus it is clear that if (3.1.2) 
is used as a definition for the determinant then postulate (y) can be derived from (3.1.2) 
itself. Let us try to verify this result with a numerical example. 


Example 3.1.4. Let 


(a) Evaluate the determinant of A; (b) Let 
(1,1, -1) = (0, 1, -1) + (1, 0, 0). 


Evaluate the determinant as the sum of two determinants with the first row replaced 
by (0,1, -1) and (1, 0, 0) respectively and show that this sum is equal to the determinant 
of A evaluated in (a). 


Solution 3.1.4. By elementary operations 


11 al j oa a 
Al=|}2 o 1ıļ=ļ0 -2 3 
-2 1 3| Jo 3 
b ch «d p. d ex 
1 1 
zoe a a. 
0 6 2 o 0 n 
= ETEEN = -11. 


Let A, and A, be the matrices with the first row of A replaced by the vectors (0,1, —1) 
and (1, 0, O) respectively. Let us evaluate their determinants through elementary opera- 
tions. In A, since the (1, 1)-th elementis 0 interchange the first and the second columns 
before starting the row operations. 


0o 1 1 1 0 1 
lAl2-|2 0 1ļ=- 2 1 
-2 1 3 1 -2 3 
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1 0 4 1 0 -1 


= 0 2 1 —— (0) 2 1 
0 -2 4 0.0 5 
= —(1)(2)(5) = -10; 
1 0 0 1 00 
IA,|=|2 0 1|=-|2 1 Of [columns 2,3 interchanged] 
—2 1 3 -2 3 1 


= -(1)(1)(1) = -1. 
Therefore 
IA] + [A2] = (710) + (-1) = -11 = |A]. 


The result is verified. 


Before closing this section let us examine a few more properties which follow from 
the observations so far. From the representation of the determinant as the sum of all 
possible terms consisting of one element each from each row and each column, given 
in property (ix), or from the postulates themselves, it follows that any matrix A and its 
transpose A’ have the same determinant. 


(x) For any n x n matrix A with A' denoting its transpose 
|Al = |4’|. 
For a skew symmetric matrix, A’ = -A which means that 
|A| = |4'| =|-Al=(-D"IAl. 
Therefore if n is odd then 


|A| =-|A] = |A| - 0. 


Therefore we have the following result: 
(xi) If an nx n matrix A is skew symmetric then |A| = 0 if n is odd or all skew sym- 
metric matrices of odd order are singular. 

3.1.3 Diagonal and triangular block matrices 


Consider the matrices 


a=|6 4 B=|6 a gel s| 81.3) 
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where O indicates a null matrix, P is r x r, Sissxs. Then A is called a diagonal block 
matrix or a block diagonal matrix, and B and C are called triangular block matrices 
where B is an upper triangular type and C is a lower triangular block matrix. By ele- 
mentary operations on the first r columns or on the last s rows we can reduce B to the 
form A without affecting the value of the determinant. Similarly by using the first r 
rows or the last s columns of C we can reduce C to the form in A without affecting the 
value of the determinant. Thus it follows that A, B and C have the same determinants. 
That is, 


|A| = |B| = |C]. (3.1.4) 


Now, notice that by operating on the first r rows and columns in A we can reduce A to 
the form 


A| =|P 
AES IPE Jo s 


I, à 


if |P| + O or to zero if |P| = O. Then operating on the last s rows and columns of A we 
can write 


LO 
0 S 


I, O 
0 L 


=i 


if |S| + O or to O if |S| = 0. But from postulate (6) we have 


I, O 


=[/=1. 
O I, HI 


Therefore we have the following result: 


(xii) The determinant of a diagonal or triangular block matrix is the product of the 
determinants of the diagonal blocks. 


In our illustrative example 
|A| = |B] = |C] = IPI IS]. (3.1.5) 
Example 3.1.5. Evaluate the following determinant directly as well by using property 


(xii): 


|A| = 


PN WwW 
NF oO 


0 
3]. 
4 


Solution 3.1.5. The matrix in this determinant can be looked upon as a triangular 
block matrix. That is, 
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| P-(3, Q=(0,0), 
S 


Then by using property (xii), 
1 3 
A| 2 IP] |S| = (3 
|A| = |PIIS| ol ; 


= (3)[(1)(4) - (3)(2)] =-6. 


For evaluating |A| directly we can add (—-3) times the second column to the third col- 
umn to obtain 


One observation can be made from this example which is also a particular case of 
property (xii). 


(xiii) If the first row (column) elements of an n x n matrix B = (by) are zeros except 
the first element, say b,,, then the determinant of B is given by 


|B| = b,,|By| (3.1.6) 


where B,, is the matrix obtained by deleting the first row and the first column of B 
or deleting the row and column containing b,,. 


This property will be exploited in the next section. It is not a special property of the 
first row (column). If any other row (column) has all elements zeros except one ele- 
ment then by transpositions of rows and columns (remember to multiply the deter- 
minant by (-1) each time a transposition is done. Transpositions are done instead of 
direct interchange of rows (columns) in order to maintain the order of the elements 
in the remaining matrix) we can bring the nonzero element to the (1, 1)-th position. 
For an element in the (i, j)-th position the number of column transpositions needed is 
j and then the number of row transpositions needed is i and hence the total number 
of transpositions needed is i + j and then multiplicative factor is (-1)?. Then we can 
apply property (xiii). 

We can extend the above ideas to look for the determinant of a product of matrices. 
Let us recall the basic elementary matrices of the E and the F types from Section 2.3. 
Postulate (B) of the definition of a determinant says that 


|A| = |FA| = |F| |A| = IAI 
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where F is a basic elementary matrix of the F type. From postulate (a) we have 
|EA|=|E||A| or |AI- |E '|IEAI 


where E is a basic elementary matrix of the E type. Note that F type elementary oper- 
ation (postulate (8)) has no effect on the determinant. Then |AB| = |E !F | (F,E,A)B|. 
Operating on the left of AB with elementary matrices is equivalent to operating on the 
left of A with elementary matrices. EAB - (EA)B and FAB - (FA)B. Then 


[E F, EIFE, «++ F,E,A)B| = |ABI. (i) 


Suppose that E,F, ---F,E,A = D,, a diagonal matrix where D, =I if A is nonsingular, 
otherwise D, is a diagonal matrix having at least one zero diagonal element which 
makes D,B singular and |D,B| = 0. When E,F, --- F,E,A = I we have A = E,! F; --. Ej! 
and |E;' --- F;‘E;"| = |A]. Then from equation (i) above, when A is nonsingular or when 
D, =I, |AB| = |Ex! --- Fy Ej" x |B] = |A| x |B]. 

(xiv) For n x n matrices A and B 

|AB| = |A] |B] 
and the property can be extended to any number of n x n matrices 


IABC --- | = [A| IBI [C] ---. 


Example 3.1.6. Evaluate the determinants |AB|], |A|, |B| and verify that |AB| = |A| |B| 
where 


Solution 3.1.6. 
1 2 
|A| = l d = (1)(-1) - (1)(2) = -3; 
1 
j = (0)(1) - (2)(1) = -2; 
val Alle i-a o) 
1 -1|]12 1 -2 0 
= (4)(0) - (-2)(3) = 6 = (-3)(-2) = |A| IBI. 


By using property (xiv) we can establish many results. Note that A” = AA---A, 
product where A is multiplied by A a total of r times. 
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(xv) psy. med... 
1 


Axe sia EZ 0) 52 |A| #0. 


This fact follows from the property 


AA™ =I > 1=|AA7|=|A||A}| a 


= 1 
|= 
IA| 


where A? - I by convention. 


(xvi) The determinant of on orthonormal matrix A is +1. 


AA! =I = 1-|AA'| - IA|A'| 2 AP = lA| = 41. 


(xvii) The determinant of an idempotent matrix is either 1 or 0. 


A=? = |Al=|AP = |Al[1-|Al] -0 = [Al =0,1. 


(xviii) The determinant of a nilpotent matrix is zero. 


A'-0, r22 = 0-|A'|-|Al' 5 |A|=0. 


Exercises 3.1 


3.1.1. Evaluate the determinants of the following matrices: 


2 0 0 10 -1 

A=!]0 -1 O]}], B=/]0 3 1|, 
0 0 O 0.0 4 
âu an Ain 


— 199 
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3.1.2. Evaluate the determinants of the following matrices: 


10-12 F 2 -14 1 
TS de RR NES RUE M 
102 3 2 1 -1 0 
0 0 4 5 8 0 -1 1 
4 1 7 -3 
0 
c-l? 8 1 
5 -1 4 2 
2 1 2 5 


3.1.3. Let a and b be n x 1 column vectors, a’ = (a, ...,a,), b! = (b4, ... , bn). Evaluate 
the determinants of 


(1) ab', (2) a'b. 


3.1.4. Consider the n x n matrix 


a b b 
Fe 4 b 
b b a 


(the diagonal elements are all a and the off-diagonal elements are all b). Show that 
(1) |A| » (a - b)! [a « (n - b]. 
(2) Verify the result in (1) for a 4 x 4 matrix with a — 2, b = 3, by direct evaluation. 


3.1.5. Evaluate the determinant of 
1a a 
V-|1 a, a 


1 a, aj 


and show that 


|V3] = (a, — a4)(a3 — a4)(a5 - a5) = | [(ai - aj) 
i»j 
3.1.6. If V, is an nxn matrix created as in Exercise 3.1.5, such a matrix is called a Van- 


dermonde's matrix and its determinant is called a Vandermonde's determinant, show 
that 


Vil =| [(a; - aj. 


i»j 
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3.1.7. Show that 


2.5 7 0 O 
-1 3 2 0 0 2 5 7 
6 4 
5 4 1 O O}=]-1 3 E 
0 00 6 4 5 4 
0 0 0 -1 1 
3.1.8. Show that 
2.5 7 1 4 
-1 3 2 6 5 2 5 
6 4 
5 4 1 O 2|-2|- 3 1-3] 
0 00 6 4 5 4 
0 0.0 -1 1 
3.1.9. Let 
a dj a; 
A=|b, b, b 
Cy Cp G 
and let 
(ay, Qa, a3) = a,(1, 0, 0) + a,(0,1, 0) t a3(0, 0,1) 
= (a,,0,0) + (0,a5,0) + (0,0, a3). 
Show that 


|A| = aA] 7 a2|A12| + 43/445] 


where A; represents the matrix obtained by deleting the row and column containing 
the element aj, namely the i-th row and the j-th column. Verify the result for 


7 8 -10 
A-|15 20 8 
10 14 25 


[Such a decomposition helps to reduce the order of the determinants involved thereby 
such a procedure is helpful when the elements of the matrix are large numbers.] 


3.1.10. Evaluate the determinants |A|, |B|, |AB| and verify the result that |AB| = |A| |B|, 
where 


1 0 4 2 1 1 
A=|2 1 1|, B=]-1 O 1 
1 -1 1 1 1 -1 
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3.1.11. Construct (1) a2x2 orthonormal matrix, (2) a 3x3 orthonormal matrix. Evaluate 
the determinants and show that the value is +1 in each case. 


3.1.12. Construct a nilpotent matrix of order 3 and evaluate its determinant and show 
that it is zero. 


3.1.13. Let J’ = (1,...,1), where J is an n x 1 vector of unities. Let B = Up, A- I, - B. 
Show that the determinants of both A and B are zeros, or both A and B are singular. 


3.114. Let 


NO DN 
oo uU UJ 


First evaluate the determinant, |A|, by elementary operations. Then show that |A| is 
also equal to the following: 


6 5 4 5 4 6 
i-a job [rol T 


3.1.15. Forthe same A in Exercise 3.1.14 show that the determinant is also equal to the 
following: 


2 3 1 3 1 2 
A|-- E 
|A| = -(4) k | + (6) t 1 (5) | | 
2 3 1 3 1 2 
=D | -əl à «eL | 
3.1.16. Evaluate the determinant of the following matrix A in two steps: 


O0 1 -12 4 1 -2 5 
9 7 


m PR 
pax. 
jum 
NEUSS 
RUN 
ON 
nu 
eo 
v 
ray 
NX 


79 70 72 90 95 92 94 60 


Oo 1 1 -1 1 
-1 0 2 -35 
A-|-1-2 0 4 2 
1 3 -4 0 3 
-1 -5 -2 -3 0 
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3.1.18. If A, Band A + B are nonsingular n x n matrices is A^! + B^! singular or nonsin- 
gular. Prove your assertion. 


3.1.19. Let A be m xn and B be n xm, m « n. Let C = AB the m x m matrix thereby |C| is 


defined. Show that 
" Im A][A O] [O AB] 
O Jp gd B 


A 2-|2 AB 


ii 
Gi) -In B| |- B 


| = (1) 9*0 | ABI. 


3.1.20. Verify the results in (i), (ii) of Exercise 3.1.19 for 


3.2 Cofactor expansions 


A convenient way of evaluating a determinant as well to study some theoretical prop- 
erties of a determinant is to expand a determinant in terms of what is known as cofac- 
tors. We define cofactors and minors of a matrix. 


3.2.1 Cofactors and minors 


Definition 3.2.1 (Minors). Let A = (aj) be an n x n matrix. Delete m rows and m 
columns, m « n. The determinant of the resulting submatrix is called a minor. If the 
i-th row and the j-th column (that is, the row and column where the element aj is 
present) are deleted then the determinant of the resulting submatrix is called the 
minor of aj. 


For example, let 


2 0 4 
A= 1 2 4 = (aj). 
0 1 5 
Then 
2 ! 1 4 A 
= minor of aq, = minor of ay, 
1 5 0 5 


Je minor of a NE minor of a 
"m B lg sg|- 22> 
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f 


0 : 
= minor of a33, and so on. 
12 


Definition 3.2.2 (Leading minors). If the submatrices are formed by deleting the rows 
and columns from the 2nd onward, from the 3rd onward, and so on then the corre- 
sponding minors are called the leading minors. 


For example, for the same A above the leading minors are the following determi- 
nants: 


0 -1 
2 4 
1 5 
Definition 3.2.3 (Cofactors). Let A = (aj) be an n x n matrix. The cofactor of ay is de- 


fined as (-1)') times the minor of aj. That is, if the cofactor and minor of aj are de- 
noted by |C;;| and |M;| respectively then 


ICl = (CD? IMs]. 
For example, let 
2 0 5 
A= -2 1 4|- (aj) 
2 3 7 
then 
ES EA 
Cul = (-p*1 2 ; = i ; =-5 = cofactor of ay); 
-2 4|) |2 4 
Cpl = (7012 =- =22= cofactor of a,;; 
ple ; | 5 | 2 
-2 1 -2 1 
Cj] = 701? n l = | 2 J| - -8- cofactor of aj; 
2.0 2 0 
C3 = (71? J =- E | - -6 = cofactor of a3, 


and so on. Immediate applications of these concepts of minors and cofactors are the 
evaluation of the determinant of a square matrix, the evaluation of the inverse of a 
nonsingular matrix and many such items through what is known as a cofactor expan- 
sion. We will establish a general result regarding the evaluation of a determinant in 
terms of cofactors by examining a simple case. 
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Example 3.2.1. By using postulate (y) expand the determinant ofthe following matrix 
in terms of the elements of the first row and their cofactors: 


4 2 3 
A=]0 -1 1 
1 2 -4 


Solution 3.2.1. Write the first row as 
(4, 2,3) = (4,0,0) + (0,2, 0) + (0,0, 3). 
Then from the application of postulate (y) repeatedly we have 


4 0 O0 0 2 0 0.0 3 
|4]J2]|O0 -1 1/+j/0 -1 1]+/0 -1 1 


Consider 
4 0 0 1 0 0 
O -1 1/|-4]0 -1 1ļ= 4 
1 2 4 1 2 4 


where the first step is done by taking out 4 from the first row (postulate (a)) and the 
second step is done by using property (xiii) of Section 3.1. Now consider the second 
determinant. By transposition of columns 1 and 2 we have 


0 2 0 2.0 0 
O -1 1ļ=-]|-1 O 1 
1 2 -4 2 1 -4 
Oo 1 
=-2 f 2| = —2[(0)(-4) - (1)(1)] = 2. 


Thelast line is done by using property (xiii) of Section 3.1. Now consider the last deter- 
minant. By two transpositions we can bring the last column to the first column, then 
take out 3 and then apply property (xiii) of Section 3.1 to obtain 


O0 O0 3 0 1 
PONES 3 2| 310 - wn] «s. 
1 2 -4 


Thus we have the following expansion of the determinant: 


-1 1 0 1 
|A|=4 -2 
2 -4 


From the above example and procedure it is clear that such an expansion is pos- 
sible for a general matrix. Thus we have the following general result: 
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(i) Let A = (aj) be an n x n matrix. Let |C;| denote the cofactor of a; and |M;| the 
minor of a; respectively. Then 


|A| = al Cg] aC) + +++ + ainlCinl 
= ag|Mgl - à52My] + = + CD" Tag IMn] 
= ajlCil + aplCil + -++ + Gin|Cin| 
= Ay (-1) Mal + a5 (71)? Mg +- + (HD ig (Min! 
for i=1,2,...,n. 
This means that the cofactor expansion can be carried out in terms of the elements 
and their cofactors of any row. The same is true for columns, that is, the expan- 


sion can be carried out in terms of the elements of any column and their cofac- 
tors. 


Example 3.2.2. Let 


2- 1 7 
A-|1 -1 2 
0 3 4 


Evaluate the following: Sum of products of the elements of the first row (1) with the 
cofactors of the elements of the first row, (2) with the cofactors of the elements of the 
second row, (3) with the cofactors of the elements of the third row. 


Solution 3.2.2. (1) The cofactors of the elements of the first row are the following: 


-1 2 
ICul= | 3 A = [(-1)(4) - (3)(2)] = -10, 
ICiol 7 - f A = -[Q0 - (0)(2)] = -4, 


1 -1 
ICs] = l 3 | = [(1)(3) - (0)(-1)] = 3. 
Therefore 
[Al = (2)[-10] + (D-4] + (-7)[3] = -45. 


(2) Now let us expand in terms of the elements of the first row with the cofactors 
of the elements of the second row. Let us denote this sum by p. Then 


P= aglC5] + a1) + a331C5, 
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= (2)(-1)3 +1! 7 4 +70 


2 1 
0 3 


d y 
3 4 
= -2(25) + (1)(8) + (7)(6) = 0. 


(3) Now let us expand in terms of the elements of the first row with the cofactors 
of the third row. Let us denote this sum by q. 
Then 


q = al C5 + 412105] + 4131C33l 


= Qc» 


20 
1 -1 


a | + (-7)(-1) 


1 -7 
1)(-1)? 
PME | «(cn 
= (275) + (-1)(11) + (-7)(-3) = 0. 
We see that sums such as the ones in (2) and (3) are zeros. This in fact is a general 
result which can be proved by writing the sum in each case as the sum of n xn determi- 
nants by writing each cofactor as an n x n determinant rather than as an (n - 1) x (n— 1) 


determinant. Then we will see that in sums, such as the ones in (2) and (3) above, two 
rows will be identical and hence the determinant is zero. 


(ii) Let A = (aj) be an nx n matrix with |C;| denoting the cofactor of a;j. Then, in 
terms of the row elements, 


ai |Cyq| + ap|Ciol + +++ aj Cul = 0 
for all i and k, i + k. In terms of the column elements, 

ail i] + aic] + +++ + asia] = 0 
for alliand k,i#k. 


Consider, for example an expansion in terms of the elements of the second row and 
cofactors of the first row. For proving the results we use the original representation of 
the minors before wiping out the column elements, that is, let q be of the following 
form: 


q =y,|Cy| * aC, (in terms of the cofactors) 
= Ay,|My4| - a5,IMg)] + +++ + (70! "a4,|M;,| (in terms of minors) 


208 — 3 Determinants 


Now, writing in terms of the original determinants we have 


a 0 > O an An `o An 
q=% "2 m|.|4n m ?'| (by postulate (y)) 
am Am Ann am am Ann 
=0 


since two rows are identical in the matrix. Same procedure is applicable if the expan- 
sion is taken as the elements of the i-th row with the cofactors of the j-th row, i +j. 


3.2.2 Inverse of a matrix in terms of the cofactor matrix 


One immediate application of the results in (i) and (ii) is to obtain the inverse of a non- 
singular matrix in terms of a matrix of cofactors thereby resulting in another method 
of evaluating the inverse of a nonsingular matrix. One method through elementary 
operations is already considered in Chapter 2. Let A = (a5) bean n x n matrix. Let |C; 
be the cofactor of aj. Consider a matrix created by replacing aj; by its cofactor. Let this 
matrix of cofactors be denoted by cof(A), called the cofactor matrix. Then 


ICu] ICial 1G 
cofa) = | Cal ICE o IC (3.2.1) 
ICml Gal eee 1Crnl 


Consider the transpose of this matrix and premultiply this transpose by the matrix A. 
That is, 


(Gils (Gali Go 62d 

, |% "R ain | | Cal Cal cue" eol 
A[cof(A)]| 2 | : : : : ; 
a a was. d : : c 

n " ie ICin | ICon| see Cnn 


Then by the results (i) and (ii) all the diagonal elements of the product are equal to |A| 
each and all the off-diagonal elements are zeros. Thus we have 


|A| 0 0 
A[cof(A)]' = : e . | SAI xl. (3.2.2) 
0 0 . A 
Let 
g- [oft 


IAI 
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when |A| # 0. Then 
AB-I = B=A". 


That is, B stands for the inverse of A whenever the matrix is nonsingular. We have the 
following result: 


(iii) Let A = (aj) bean n x n matrix with |A| denoting the determinant, |C;;| the co- 
factor of aj, cof(A) the cofactor matrix of A, [cof(A)]’ the transpose of the cofactor 
matrix, then the inverse of A, whenever it exists, is given by 


At= gj eof |A| #0. (3.2.3) 
This formula (3.2.3) gives another way of computing the inverse of a nonsingular ma- 
trix. 


Example 3.2.3. Evaluate the inverse of A, if it exists, by using the cofactor matrix, 
where 


1 0 
A-z|-1 1 
1 1 


Solution 3.2.3. In order to apply this procedure we have to compute all the cofactors 
as well as the determinant. [Hence this method of evaluating the inverse is not that 
efficient unless the matrix is 2x 2 in which case the determinant and the cofactor ma- 
trix can be read without any computation.] The determinant of A through the cofactor 
expansion is given by 


Td -1 1 
azoli |-o[; i 


-1 1 
1 --2. 
«oh i 


Since |A| + 0 the inverse exists. [If |A| was equal to O we would have stopped the pro- 
cess here itself.] The various cofactors are the following: 


1 1 
Cul = cofactor of ay, = ; j - 0, 
-1 1 
Cy| = cofactor of a =- a= 2, 
-1 1 
C,3| = cofactor of a4, = 1 ale -2, 
0 1 
C3,| = cofactor of a; = - | | =i; 
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0 O 1 
C -0, |C4|-- =-1, |C4l- =-1, 
a - B i-e ien] =- cal | 
1 1 1 0 
C44| 2 - --2, |C4l|- =i; 
IG G ] ICs b ; 
The matrix of cofactors is then 
0 2 -2 
cof(A)=|1 0 -1 
-1 -2 1 
Then 
(0) 1 -1 
E 1 ! 1 
A = ja oe) =; 2 0 -2 
-2 -1 1 
1 1 
0O -3 3 
-|-1 0 1 
1 1 
1 3 73 


1 0 aıļlfo -4 3 

AA =|-1 1 1||-1 0 1 
1 1 ajli 1 -i 
1 0 0 
=|0 1 O[-LI 
00 1 


The result is verified. One major disadvantage of this method, compared to the method 
of elementary operations, is that here one has to evaluate one n x n determinant and 
n?, (n 2 1) x (n — 1) determinants. 


Example 3.2.4 (Covariance matrix). In statistical theory and its applications in many 
fields a concept called the covariance matrix plays a vital role. Let X — (31) be a real 
bivariate vector random variable. The covariance matrix for this real bivariate case is 
given by 


ve | oi bd 
002P 0) 


where g; > 0, o, > O are the standard deviations of x, and x; respectively, their squares 
are the variances, and p with -1 <p < 1 is the correlation between x, and x;. When 
0, = O and o, = Othe variables are degenerate. When p +1 the matrix V is singular and 
the variables are linearly related. Evaluate the inverse of V in the nonsingular case. 


3.2 Cofactor expansions —— 211 


Solution 3.2.4. The determinant 


The cofactor matrix is then 


2 — 
cof(V) = | 9? mel 
-P0107 01 


Since the cofactor matrix is symmetric here, its transpose is the same as itself. Hence 


1 1 02  -p0,0. 
y urina fv E 2 172 
|V| idi 0103(1 - p?) E. oi 


G7 Ps 
= 1 | oi ao: | 
T 2 P X 
(1 p ) 010; 0j 
[This is the matrix of the quadratic form appearing in the exponent of a real bivariate 
normal or real Gaussian density.] 


3.2.3 A matrix differential operator 


Let X = (xj) bea matrix with functionally independent real variables as its elements, 
that is, x;'s are functionally independent (distinct) real variables. Functionally inde- 
pendent means that no element is a function of other variables. Then the matrix dif- 
ferential operator is defined as follows: 


Definition 3.2.4 (A matrix differential operator). 


o0 a 

ð -( ð )- am T Yer OXim . 

OX \ dx, 3 2n cue eee |i? 
OX, Xm 7 0X 

of (of 

(E 


where f is a scalar function of the m x n matrix X. It is the matrix of the corresponding 
partial derivatives. 


Example 3.2.5. Let X beapxp matrix of functionally independent real variables and 
u its trace. Evaluate n 


Solution 3.2.5. Trace is the sum of the leading diagonal elements: 


u = tí(X) 2 Xy + X2) + ++ + Xpp. 
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Therefore 
CN RUOTE S 
0X4 0X4; OXip 
Cue eo PEL wasps dU o, i+j 
OX ji OX, 
Hence 
1 O 0 
ou O 1 0 
— = =f. 
OX Doo p 
O0 o0 1 


Thus we have the following result: 
(iv) When X is a p x p matrix of functionally independent real variables 


ou 
Ure SS z—-—IL. 
(X) ax P 
Example 3.2.6. Let X be a p xp matrix of functionally independent real variables. Let 
|X| be the determinant of X, |X| #0, and E the matrix differential operator. Evaluate 


ax| 
OX * 


Solution 3.2.6. Consider the cofactor expansion of |X| in terms of the elements of the 
i-th row and their cofactors: 


IXI = xilCil + xolCio] + +++ + XinlCinl (3.2.4) 


where |C;;| denotes the cofactor of xj. Note that |C,| does not contain x; and hence 
when the partial derivative of |X| is taken with respect to x; we obtain |C;;|. Thus when 
the matrix differential operator operates on |X| we get the cofactor matrix. That is, 


r Cul = ICi 
— |X| = cof(X) = : - : 
ox 

IC5il Ses IC | 


But we have already seen that the inverse of a nonsingular matrix is the transpose of 
the cofactor matrix divided by the determinant. Then 


o|X| EN 
—— = KI). 
AX - X(x) 


Thus we have the following result: 
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(v) When X is a nonsingular matrix of distinct functionally independent real vari- 
ables 
olx 


K [X|(x-1)’. (3.2.5) 


We can modify the above result to obtain a result for a nonsingular symmetric matrix. 
When X is of functionally independent real variables as elements except for the prop- 
erty that X = X’, that is, X is symmetric then following through the above procedure we 
can derive a result analogous to the one in (3.2.5). Observe that when X is symmetric 
we have 


o|X 

uae |C;| = cofactor of x; and 

OX; 

o|X| , oe 
= 2|C;l = 2 times the cofactor of Xy dj 

OX; 


y 


Thus when the matrix operator 3. operates on |X| we have the following format: 


Ci] 2lC5] ... 2l6yl 
AIX] |2IC4| |Cl ~.. 2l65 (3.26) 
ax BIuw eee B 
2C] 2lCp2l ...— ICppl 


The diagonal elements are not multiplied by 2 whereas all the nondiagonal elements 
are multiplied by 2. A convenient notation for writing the right side of (3.2.6) is the 
following: 


d = 2cof(X) - diag(cof(X)) (3.2.7) 


where diag[cof(X)] means a diagonal matrix created with the diagonal elements of the 
matrix cof(X). Then converting (3.2.7) in terms of the inverse of X we have the following 
result: 


(vi) When X is a nonsingular symmetric matrix of functionally independent real 
variables then 
olx 


aX IX|[2X 1 - diag(X !)]. (3.2.8) 


This result has many applications, especially in obtaining the maximum likelihood 
estimators ofthe parameters in a multivariate Gaussian density. This is also applicable 
in general maxima/minima problems or optimization problems involving traces and 
determinants. 
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Example 3.2.7. Let X = (xj) bea p x p matrix of distinct functionally independent p 
real scalar variables and let 2 = (3) be the matrix differential operator. Let B = (bi) 
beap xp matrix of constants, that is, b;’s are not functions of the Xj's. Letu - tr(BX). 
Evaluate m. 


Solution 3.2.7. The first diagonal element in the product BX is 
byxy Dix + + DyyXp1 
or the i-th diagonal element in BX is given by 
bii + DinX aj + +++ + bipXpi (3.2.9) 


and then the trace is the sum of (3.2.9) over alli, i=1,...,p. Thus the partial derivative 
of u = tr(BX) with respect to Xij is b. This may be noted from (3.2.9). Hence when all 
the elements in X are distinct we have the matrix 

dU i 

ox 
the transpose of B. If X = X' then the partial derivatives of u = tr(BX) with respect to 
Xj give 
bj forj-i 
d n (3.2.10) 
OXy | by t+ by forj£i. 


Thus the diagonal elements of B come only once. Then the matrix configuration, using 
the notation in (vi), is 


i - B « B' - diag(B) 


where diag(B) is the diagonal matrix created by using the diagonal elements of B. Thus 
we have the following result: 


(vii) When B = (bj) is a p x p matrix of constants and X = (x5) a p x p matrix of 
functionally independent real scalar variables and when u - tr(BX) then 


p! if all elements in X are distinct 


T -4B-«B'-diag(B) ifX-X' (3.2.11) 
2B - diag(B) if X =X’ and B - B'. 


The result above has various types of applications in statistical analysis and related 
areas. 
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3.2.4 Products and square roots 


Products and integer powers of matrices and their determinants are already consid- 
ered at the end of Section 3.1. From there, many more properties follow. Even if the 
square matrices A and B do not commute the determinants of AB and BA are the same. 


(viii) |AB| = |BA| = |A| |B] = |B] |A| 
|I - AB| =|I -BA| if Bor Ais nonsingular. 


Now it is natural to ask the question: is |A?/4| = |A|/4 where p and q are integers? Let 
us examine a simple case with p = 1, q = 20r the square root of a matrix A. For a given 
matrix A can we find a matrix B such that B? = A? This B, if such a B exists, can be 
called a square root of A. If such a B exists, is it unique? Can there be different matrices 
whose squares are all equal to the same matrix? Let us take one of the simplest cases, 
a2x2identity matrix. Let 


Note that 
Bj = A = BS = B5 = Bj. 


Thus B,,...,B,, all qualify to be square roots of A. For a given matrix, even if such a B 
exists it need not be unique unless more conditions are imposed on A. Hence in our 
discussions to follow we will only consider integer powers (positive powers or negative 
powers if the inverse exists) of a square matrix and the determinants associated with 
such powers. 

In Chapter 2 we have seen that when a matrix A is nonsingular then in most cases 
it can be reduced to the form 


A=LU 


through elementary transformations, where L is lower triangular and U is upper tri- 
angular. Then 


IA] = ILI [U| 


which is equal to the product of all the diagonal elements in L and U. 
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3.2.5 Cramer's rule for solving systems of linear equations 


As another application of the cofactor expansion of a determinant one can examine 
the solutions of a nonsingular system of linear equations. Let A = (aj) bean n x n 
matrix and nonsingular and consider the system of linear equations 


AX-b = X-4 b, 
b! =(by,...,Dn), X'-0q...,xs). 


Writing A^! in terms of the cofactor matrix we have 


lCal ICs]. ICul] [5i 

-1 Yl! Cal ICol | | b2 
|A| an : : 
IC ICE  [Canld Lb, 


where |Cj;| is the cofactor of a. Then the i-th element in X is given by 


1 
X= jaj iil + bjlCyl + ++ + Bal Cyl}. 
The numerator on the right side corresponds to the cofactor expansion of a determi- 
nant in terms of the i-th column of A with the i-th column being b and all other columns 
the same as those of A. Therefore 


A 
x=, i=1,2,...,n (3.2.12) 


where |A,| is the determinant of A with the i-th column of A replaced by b and other 
columns remaining the same. [In a practical situation we will try to solve the sys- 
tem through elementary transformations which may work out to be much easier and 
faster than evaluating determinants of the type in (3.2.12).] The rule in (3.2.12) is called 
Cramer’s rule and it is more of theoretical interest rather than of practical use. 


Example 3.2.8. Solve the following system of equations by Cramer's rule, if applica- 
ble: 


Xj +X% +X =3 
X(-X4-0 
2x4 + X; + 2X5 — 5. 
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Solution 3.2.8. Writing the system as AX - b we have 


11 1 x 3 
A-|1 0 -1], X=|x], b=|0 
2. i 2 Xj 5 


First we need to compute |A]. If |A] = O then the rule does not apply. 


|A| = 


Nere 
[e] 
| 
=. 


1 
=10 -1 -2| [-1(1)+(2); -21) + 8) >] 
(0) 


-0 a -2| [312402] 
Du 0o 2 


= (0-00) = -2. 


The rule is applicable. Replace the first column of A by b and evaluate the determinant. 
According to our notation, 


31 1 
lAj2]0 0 l. 
51 2 


Expanding in terms of the elements and their cofactors of the second row we have 
3 1 
A,| 2 -(71 =-2, 
Ad = -C1) k ] 


Now, replace the second column by b. Then 


13 1 
lA]2 0 -1 
2 5 2 


Expanding in terms of the elements of the second row and their cofactors we have 


mi-o Jo-e 3-coo-enco--2 


Now, 


|A3| 


Il 
Nee 
= O e.e 
uou 
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Expanding in terms of the elements of the second row and their cofactors we have 
1 3 
A =-(1 = —-(1)(2) = -2. 
IAs] oli ; DB 


Hence 


X% = = =1=x, =X. 
-2 

Note that when evaluating the above determinants we looked for the rows or columns 
containing the maximum number of zeros. Then we used a cofactor expansion in terms 
ofthe elements of that row (column). If a cofactor expansion is going to be used to eval- 
uate a determinant then this is a rule of thumb. Note also that Cramer's rule is rather 
lengthy because, in general, n + 1 determinants of n x n matrices are to be evaluated 
to complete the process. In practice, the easiest way to solve a linear system is to go 
through elementary operations which can also determine, at the same time, whether 
the system is consistent, singular, nonsingular, with many solutions or with a unique 
solution. 

The configuration of signs when using a cofactor expansion to evaluate a deter- 
minant can be remembered from the following matrix format: 


Before we conclude this section we may observe a few more minor points. Consider a 
product of several n x n nonsingular matrices, for example, a product of three, ABC. 
Then 


(ABO)! = CBA! > 
[cof(ABC)]'  [cof(C)]" [cof(B)]’ [cof(A)]’ 


|ABC| IC| |BI |A| 


Therefore 


(ix) [cof(ABC)]' = [cof(C)]' [cof(B)]' [cof(4)]'. 


Let us see what happens to the inverse of A’. 
-1 _ [cof(A’)]' cof(A) 
|A’| |A| 
=(4")', 


since |A| = |A’| 


(A') 


That is, 
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(x) Up 
and 


[ABO]  - (c) (B) (4). 


Exercises 3.2 


3.2.1. If the following matrix is denoted as A = (aj) then evaluate the cofactors and 
minors of a1), A>), 31,433, where 


3.2.2. For the matrix A in Exercise 3.2.1 compute the leading minors. 


3.2.3. Expand |A| in terms of the elements of (a) the first row and their cofactors, 
(b) the third row and the corresponding cofactors, (c) the second column and the cor- 
responding cofactors, where A is the same matrix in Exercise 3.2.1. 


3.2.4. Verify property (ii) by expanding |A| of the matrix in Exercise 3.2.1 in terms of the 
elements of the (a) first row and cofactors of the elements of the third row, (b) second 
row and the cofactors of the elements of the third row. 


3.2.5. Evaluate A^, if it exists, by computing the cofactor matrix, where 


3.2.6. By multiplying and then taking the trace verify the result (3.2.11) if 


2 1 -1 
(1) X=(xj), B-|O 1 5], x,’s are distinct; 
2 3 -1 
(2) X=X' andthe same Bas in (1); 
2 1 -1 
(3) X=X' and B-|1 1 5 
-1 5 -1 


3.2.7, From the mechanical rule of Section 3.1 write down the explicit form of the de- 
terminant of a 3 x 3 matrix X = (xj). From this explicit form compute zx | for the 
cases: (1) all elements of X are distinct, (2) X = X'. Thus verify the result in (3.2.8). 
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3.2.8. Consider the n x n matrices 


1 1 1 
B=- : 
Tac 1 
A - I, - B. 
Compute the following: 
(1) |B?8!|, (2) [4922 (3) [A250 + p». 


3.2.9. Are the following statements true or false. If false give two counter examples 
each: (1) | - A| = -|Al; (2) If A’ = -A then |A'| = -|A| and since |A’| = |A], |A| = 0; (3) If A 
is a matrix with real elements and if A = A’ then A can be written as A = BB’ where B 
is a matrix with real elements. 


3.2.10. Show that 


2 -1 0 
- 2 -1/=(3+1)=4 
0 -1 2 
and that for an n x n matrix 
2 -1 0 0 O0 0 
-1 2 -1 0 O0 0 
: : =(n+1). 
0 0 0 O 2 -1 
0 0 0 O0 -1 2 
3.2.11. Let D, be the n x n determinant 
1 -1 0 0 O0 0 
1 1 -1 0 o o0 
D, = ; , 
0 0 0 O 1 1 


Show that D, = D,,_; + D, ». [This recurrence relation produces the Fibonacci numbers 
1,2,3,5,8,13,21, ....] 
3.2.12. Let A= (ai) beannxn matrix where ay = i+j. Evaluate the determinant of A. 


3.2.13. Let 


Evaluate the determinant of A2°. 
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3.2.14. Let 


, azbz0. 


sor E Y 
“ra fT 
sa oft 
a oO cc 


Evaluate the determinant of 4109, 


3.2.15. Show that (1) A! is symmetric if A is symmetric, (2) T^! is lower (upper) trian- 
gular if T is lower (upper) triangular. 


3.2.16. Show that the determinant of the n x n matrix 


0 1 O0 za 0 0 
0 0 1 0 0 
A- ; 
(0) (0) (0) (0) 1 
“Ag -A -a2 An-2 7n 
is equal to (-1)"ag. 
3.2.17. Let 
dy b 0 0 0 0 
G a, b, 0 0 0 
ds 0 c ds Bs 0 0 
0.0 Q0 0 .. an bpa 
0 0 O O .. CQ, d, 


Show that its determinant, |A,|, can be written as 
[Anl = @ylAn11—Pp1Cn-1!An_2| forn>3. 


3.2.18. Let the n x n matrix A be partitioned as follows, where A, is p x p: 


Show that 
|A| = (-1)*YP |A,| |4,]. 


3.2.19. Let Cof(A) denote the matrix of cofactors of the n x n matrix A. Then show that 
the determinant of this cofactor matrix is given by 


|Cof(A)| = |AI" 3. 
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3.2.20. Let I,A,Bbenxn. Show that if I + AB is nonsingular then I + BA is nonsingular 
and that 


(I + AB)! =I - BU+BA)1A. 


3.2.21. If Aisnxn, Uisnx1, V isnx1and if A+ UV’ and A are nonsingular then 
show that 

(AUYV'A^) 
EST 
3.2.22. Let A = (a,(x)) be an n x n matrix where the elements a;;’s are differentiable 
functions of x. Let a4, a>, ...,a, denote the columns of A so that 


(A+UV')*=A7 


A=(a,...,€,) and |A|-J(a, ....ay)]. 


Let fa denote the vector of derivatives of the elements in aj. Then show that 


ETE + (a rna as) 


pi E 


3.2.23. Hadamard’s inequality. Let A = (aj) bean nx n matrix. Show that 


AP <J IY iae}. 


jai 


3.2.24. For any two m x n matrices A and B show that rank(A + B) x rank(A) +rank(B). 
3.2.25. If A and B are matrices such that AB is defined then show that 
rank(AB) x min(rank(A), rank(B)). 


3.2.26. If A is an nx n nonsingular matrix and if B is n x m and C is p x n then show 
that 


rank(AB) = rank(B), rank(CA) = rank(C). 
3.2.27. If A is mxn and Bis n x m with m > n then show that |AB| = O. 


3.2.28. Circulant matrix. Evaluate the determinant of the circulant matrix 


A a, A .. Any 
A-|üa % Uv ana 
Eg. v " 
a dj; a3 .. dg 


3.2.29. Show that |A] = c"! (nx + c) where the n x n matrix 


X+C X Ses X 
X XTEC .. X 


X X (ee WEE 
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3.3 Some practical situations 


The theory of determinants has applications in all sorts of practical problems as well 
as in theoretical developments of many other fields. A few of these will be listed in this 
section. 


3.3.1 Cross product 


A concept called cross product of two vectors in 3-space is found in elementary 
trigonometry, with applications in physics, chemistry and engineering problems. 
In the notation of Chapter 1 let 


à = qi + a5j + ask, 
7=(1,0,0), j=(0,1,0), k=(0,0,1), and 
b = bii bj + bk 


be two vectors in 3-space. Consider the parallelogram generated by these vectors on 
the plane determined by å and b as shown in Figure 3.3.1. 


Figure 3.3.1: Parallelogram. 


Let us try to construct a vector ¢ which is orthogonal to both d and b and whose 
length is equal to the area of the parallelogram generated by å and b. Such a vector is 
usually denoted by 


2 2 


C-üxb 


the cross product of à with b. From elementary considerations it can be shown that 
the vector ¢ is obtained by opening up the following 3 x 3 determinant 


é-áxb-|a a al. (3.3.1) 
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Opening up in terms of the elements of the first row and their cofactors, treating i, j, k 
as some elements of the matrix, we have 


> 7 _2]% 43) 2/41 45) |ù a 
mre b, bobs +k bib 
= (a,b; — a3by)i - (ab; — a3b,)j + (a,b, - a,b, )k. (3.3.2) 


Example 3.3.1. Construct the cross product of á = i +} — k with b = 2i +j+ k by using 
(3.3.1) and show that this cross product vector is orthogonal to both å and b and whose 
length is equal to the area of the parallelogram generated by å and b. 


Solution 3.3.1. From (3.3.1) 


ij k 
@=axb=|1 1 -1 
21 1 
s M -aal -1 -l -l1 1 
=i -j +k 
1 1 2 1 2 1 
-2i-3j-k 


The dot product between C and d is then 

C.á = (2)(1) + (-3)(1) + (-2 (71) = 0. 
The dot product between ¢ and b is 

èb = (2)(2) + (-3)(1) + (-1)(1) - 0. 


Thus the cross product vector of d with b is orthogonal to both å and b. The length of 
€ is given by 


lel = MEE + (—3)2 + (-1)2 = VIA. 


For any two vectors U and V in n-space the area of the parallelogram on the plane 
containing U and VV is given by the following expression, see Figure 3.3.1. 


area = ||Ül || V | sin 0. 


[Twice the area of the triangle = 2(1/2) base times the altitude = base times the altitude 
= (uU) (IV) sin 0] where 0 is the angle between Ü and V. But 


DEEP EE 
sin0- V1- cos? 0 = "m i ) 
IU II 
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1 


- .l Jöt 0i (3.3.3) 
TATE RITE ) 


holds for all Ü and V, U + O, V + O. Then the area of the parallelogram is 


> > 


area = |Ü I?I sin0 = NIOTPIV HP? — (0.0). (3.3.4) 


Consider two vectors @ and P in 3-space. Substituting in (3.3.4) we have 


area = (at + a + a2)(bj + b$ + b3) - [a,b + ab, + a3b3)?. 


Simplifying and rewriting we have 


area = lab; - 3b)? + (a,b; - a3b,)? + (ab, - ayb,)? 


= là x DJ. 
In our illustrative example 
[DA - (-1))]? + [1X0 - (-1)2)]? + [OX - (0)? = 14. 


This verifies the result. 

One observation is immediate. If the second and third rows of (3.3.1) are inter- 
changed then the resulting determinant is (—1) times the original determinant which 
implies that 


axb=-bxa. (3.3.5) 


3.3.2 Areas and volumes 


Consider two vectors à and b in 2-space as shown in Figure 3.3.2. The lengths of d and 
b are 


läl= ya} +03, BI = yb} + b3. 


Let 0; be the angle @ makes with the x-axis and 0, the angle b makes with the x-axis 
and 6 = 0, — 6,, the angle between @ and b. One choice of 6,,6,,0 is shown in Fig- 
ure 3.3.2. [The final result will be true for all choices of 84, 6, 0.] 


cos 0 = cos(0, — 8,) = cos 0, cos 6, + sin 0, sin 0, 


__ by Oh, ue b, a 
2.122172 223p du 22 
yb? +b5 ME + a5 vbi + b$ vai t a 
a.b 


— aii bl 
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Figure 3.3.2: Area. 


The area of the parallelogram is then 


area = Jl [PI sin @ = lal IIb? - (4.5)? 
= (a? + a3)(b} + b3) - (a,b, + a,b? 


v 


or the area is the absolute value of the determinant with à and b as its first and second 
rows. 


aq o 


Vatan- [f b; 


a 


a 
area = absolute value of ? 
b, b, 


Now, consider the parallelepiped generated by the three vectors d, b,éina 3-space. 
The volume of the parallelepiped is the base area multiplied by the altitude. Consider 
the base area of the parallelogram generated by b and €. The area is the length of the 
cross product, Ib x ell. The altitude is also equal to |á| cos0 where 0 is the angle å 
make with the normal Ñ = b x č as shown in Figure 3.3.3. 


Figure 3.3.3: Volume of a parallelepiped. 


Therefore the volume, denoted by v, is given by 


v=|bx ey jäl EEO] (substituting for cos 6) 
liat |b x Cl 
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= [à.(b x &)] 


= a4[b5c3 - b3Cy] — a,[b,c3 - b3c4] + a5[b46, — bci] 


Thus the volume of the parallelepiped generated by the vectors å, bandéina 3-space 
is the absolute value of the following determinant: 


a a d3 
v= absolute value of |b, b, bs]. (3.3.6) 
€ C) G 


Note that v = 0 if any of the vectors is a linear function of the others. For example if å 
lies on the plane determined by b and é then v = 0. This formula can be generalized. 
Let O be the origin of a rectangular coordinate system. Let 

X, = (Gi2X45 .. X1) 


X, = (X51, X25; eM X); 


Xn = OG Xi; Ue Xn) 


be n points and consider the vectors, OX,, ER OX,. Assuming that these are linearly 
independent, the volume of the parallelotope generated by OX, 295 OX, is given by 
the absolute value of the determinant 


v, = absolute value of |X| = |XX’ |? (3.3.7) 
where 
Ne Ker. 2a. Bis 
y-a Xm o Xm 
Xm Xm > Xm 


Note that |XX'| = |X| X'| = |X|?. But |XX'| remains non-negative and hence by using 
this form we do not have to worry about the absolute value. This form is also useful in 
dealing with r points in n-space, r < n. If there are n +1 points X}, ..., X,,; inan n-space 
such as 3 points in a 2-space then we can shift the origin to one of the points then the 
situation will be as in (3.3.7). 

The volume of this parallelotope can be shown to be given by the following deter- 
minant, in absolute value: 


Xni Xn+12 © Xnsin 1 
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Figure 3.3.4: n + 1 points in n-space. 


Xu 7 Xn+11 As Xin 7 Xniin 0 
Xm 7 Xn+11 Xnn — Xn+in 1 
Xu Xn ce Xin 7Xnan X, T Xia 
= : p : - : F (3.3.8) 
Xni Xn+11 ce Xm ~Xn+in X, m Xni 


The origin is shifted to the point X, ,,, as indicated in Figure 3.3.4, then the other points 
are X; - X4,,,1- 5,2, ..., n with respect to the new coordinate system. Thus (3.3.8) agrees 
with (3.3.7), where in (3.3.7), X, ,, is the origin O itself. 


Example 3.3.2. Evaluate the volume (area in 2 space) of the parallelotope (paral- 
lelepiped in 3-space) created by the vectors OX,, ns OX, where O indicates the origin 
where 
(a  X(-(LD, X,=(1,-1), 
(b) X,=(2,0,-4), Xj,-(LL-1, X= (1,0,1), 
()  Xj-(LLL1, X,-(LL1-1) 
X;,-(LL-L1, X,-(L-111). 


Solution 3.3.2. (a) In this case we have the area of the parallelogram created by 
OX,, OX,, denoted by v>. Then 


The absolute value = v, = 2. 
(b) In this case we have the volume of a parallelepiped, denoted by v4. Consider 


2 0 -4 10 1 10 1 
1 1 -1}=-]1 1 -1|--]|0 1 -2 
10 1 2.0 -4 0 0 -6 


= —(1)(1)(-6) = 6. 


The absolute value is 6 and hence the volume is 6. 
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(c) In this case we have the volume of a parallelotope in 4-space, denoted by v,. 
Consider 


1 1 1 1 1 1 
1 1 -1| [00 0 2 
1 -1 1| |0 0 2 O 
1-11 1 o 2 0 0 
1 1 1 1 
0 2 0 0 

=- =8. 
0.0 2 0 
0.0 0 -2 


The absolute value is 8 and hence v, = 8. 

In a 2-space we have a parallelogram generated by OX, and OX,, by completing 
the parallelogram. This parallelogram consists of two identical triangles. Then the 
area of one such triangle is iv, where v; is the area of the parallelogram. In a 3-space 
we have a parallelepiped. How many identical simplexes (simplices, 3-dimensional 
analogue of the triangle) can be packed into this parallelepiped? It can be easily 
seen that we can pack 6 = 3! such simplexes. The following are some standard no- 
tations in this area. V,, (nabla) and A, (delta) are used to denote the volumes of the 
n-parallelotope and n-simplex respectively. 


Notation 3.3.1. 


V, = volume of an n-parallelotope in n-space 


A, = volume of an n-simplex in n-space. 
Then we have 


A= V. (3.3.9) 


3.3.3 Jacobians of transformations 


In a calculus course the instructor might have told that the Jacobian is a determinant 
and the curious students must have been wondering how a determinant enters into the 
picture. Let us see what happens if we take skew symmetric product of differentials. 


Notation 3.3.2. ^ = (wedge), dx ^ dy (skew symmetric product of the differential dx 
with the differential dy), where x and y are real scalar independent or free variables. 


Definition 3.3.1. The skew symmetric product of the differential of x, dx, with the dif- 
ferential of y, dy, is defined as 
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dx Ady = -dy Adx = dxA^dx--dx^dx 
= dxAdx=0, 


since dx ^ dx is a real scalar quantity. Let us consider some transformations. Let x and 
y be two real free variables and let u and v be functions of x and y. Let 


u=f,\(%y), v-fj06y) (3.3.10) 
As examples, 

(a) uz2x-3y v=x+y 

(b u=x*+y’?, v=x-y 

(c) u-x^«2xy«y v=x+5. 

Taking the differentials in (3.3.10) we have, from elementary calculus, 
du = fi ay + dfi ay and (a) 
Ox oy 


dv= Pise 8o dy. (b) 
Ox oy 


Let us take the skew symmetric product of the differentials in u and v. 


du Adv = Trax Pay) a [Baxs Pay], (c) 
Ox oy Ox Oy 


According to Definition 3.3.1 an interchange brings in a negative sign and hence when 
taking the product in (c) remember to keep the order and change the sign if the or- 
der is reversed. Straight multiplication of the right side in (c), keeping the order and 
neglecting higher orders such as dx ^ dx and dy ^ dy, since they are zeros, we have 


du A dv = i A ae di 9n Oh e di 
OX Ox Ox Oy 


of; af; of af; 
+ zm ax M ^ die By oy 


- 9f o ng p ER Oba des 0, 
Ox Oy oy Ox 


Note that in one term we have dx ^ dy and in the other term dy ^ dx = -dx ^ dy. There- 
fore 


Ox Oy Oy OX 
= of, glano 
Ox oy 
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The coefficient of dx ^ dy is called the Jacobian J and it is a determinant. If J + 0 then 


1 
dx Ady = z% ^ dv, (3.3.11) 
and the transformation (x, y) — (u,v) is one to one. Let us generalize this procedure 
to functions of many real variables. Let x,,...,x, be k free real scalar variables and 
consider k scalar functions of x, ...,x;,. Let 
i=1,2,...,k. 


y; 2 fi9a, Xq)s 


If the number of equations is not equal to the number of independent variables 
Xj, ..., Xy then we cannot expect a one to one transformation. Even then for a one to 
one transformation we need the Jacobian to be nonzero. Only in this case one can 
write dx, ^ --- ^ dx, in terms of dy, ^ --- ^ dy, and vice versa. Then proceeding as 
before we note that 


9n en 
Ox, OX, 
dy, A---Ady, =| : : | dx, A---Adx, 
Sf 9f 
Ox, OX, 


where 


J = determinant of the Jacobian matrix (2: ). 
X; 
j 


The (i,j)-th element in the Jacobian matrix is the partial derivative of y; with respect 
to x;. Since the transpose of this matrix also has the same determinant we could take 


9yi ð 


the Jacobian matrix as (5%) or (St yV’. Let us evaluate some Jacobians. 
J J 


(a) Jacobians of linear transformations 


Consider the linear transformation 


yi-agXp + +akX p 1=1,...,k. 
This can be written as 
yı x 
Y-AX, Y=]: 1], X-2|:|, 
Yk Xk 
ay ay 
Az(ag)2| : ] 
ag kk 
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Note that if |A| + O then A^! exists and then 
Y-AX > X- AY, 
thatis, Y can be written uniquely as a function of X and vice versa. In this case we have 


a one-to-one transformation. The coefficient matrix A in the above transformation is 
a matrix of constants. The Jacobian matrix in this case is 


(st) - («pA = J- A. 


Thus we have an interesting result. In all the results to follow we will use the following 
notation. 


Notation 3.3.3. When X is an mx n matrix of mn free real variables the skew symmet- 
ric product of all the differentials in X will be denoted as follows: 


and if X = X’ and px p then 


For example if 


XQ X 
x-| n | > dX = dx ^ dx ^ dx5, ^ dx) 
Xn Xn 


= dx Adxp2A^AdXx» for xy =X orX =X’ 


When taking the variables x4,,x4?,X5,, x5; to form dX they can be taken in any conve- 
nient order to start with. Once they are taken in some order then that order has to be 
kept throughout that computation involving dX. For any transposition of differentials 
during the computational process the sign rule in the definition will apply. 

In the symmetric case there are only p(p +1)/2 free scalar variables. From the above 
notation, if X is a k x 1 vector then 


dX = dX! = dx, ^ A dx. 


Then we can write the result in the above linear transformation as follows: 
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(i) Y-AX, |A|#0, => dY - |A| dX, 


Y and X are k x1 vectors of real scalar variables and A = (a) isamatrix of constants. 
Example 3.3.3. Evaluate the Jacobian in the following linear transformation. Is the 
transformation one-to-one? 


yi 2X X; t X3 
V2 =X 7X2 + X3 


Y3 = 2X + X) - Xs. 


Solution 3.3.3. Writing in matrix notation the transformation is 


yı X 1 1 1 
Y-AX, Y-]yp|. X=|x, |], A=]1 -1 1 
ys X3 2 1 -«1 


The Jacobian, J, is seen as the determinant 


J={|1 -1 1|2]0 2 O [-1(1) + 2; -2(1) + (3) >] 
2 1 -1 0 -1 -3 


-2 0 
= E E = C23) - (-1)(0) - 6. 


Since 7 0 the transformation is one-to-one here. 


(b) Linear matrix transformation 


Let us consider a more general linear transformation. Let X be an m x n matrix of 
functionally independent mn real scalar variables, let A be an m x m nonsingular ma- 
trix of constants. Consider the following one-to-one transformation (one-to-one since 
|A| # 0): 


Y = AX, A = (ay) is mx m, X,Y aremxn. 


What is the Jacobian in this transformation? If Xj, ..., X, and Y,,..., Y, denote the n 
columns of X and Y respectively then we can write this transformation in the following 
equivalent form: 


[Y Y,, ..., Yq] = [AX AX,, ..., AX]. 


Note that Y; does not contain X,, ... , X,, Y; contains only X; and so on. Hence if we take 


Y, 
the variables in Y in the order ( E ) and the variables in X in the order (X1, X5, ... , X7) 


n 
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then by taking the partial derivatives, the Jacobian matrix has the following form: 


X xh. Xl 
Y oA: OT Ge 0 
Y, OA 0 
Y. 00.. A 


observing that the partial derivative of Y; with respect to X; can produce A ifj = i and a 
null matrix if j + i. The determinant of the above block diagonal Jacobian matrix is the 
product of the determinants of the diagonal blocks. Hence the Jacobian is |A|”. That is, 


(ii) Y-AX, |A|#0, Y,X,mxn > dY=|A|"dXx. 


What will be the effect if X is postmultiplied by a nonsingular n x n constant matrix B 
so that the transformation is one-to-one. That is, 


Y-XB, |B) #0, Y and X aremxn, Bisnxn. 


This Jacobian can be evaluated by observing the following: Look at the rows on both 
sides and follow through the procedure above then we have the next result. 


(iii) Y=XB, |B|#0, Y,X,mxn > dY = |B|"dx. 


(c) Multilinear transformations 


Combining the results in (ii) and (iii) above we have a general linear transformation 
of the type Y = AXB, where Y and X are m x n matrices of mn free real scalar variables 
and A, mx m and B, nx nare nonsingular matrices of constants. This transformation 
can be looked upon as Z - AX and Y - ZB or U - XB and Y - AU and then apply the 
above results. Then we have the following: 


(iv) For X and Y, m x n, |A| + 0, |B| + O where A is m x m and B is n x n matrices of 
constants, 


Y = AXB = dY = |A|” |B|" dX. 


Example 3.3.4. Consider the following linear transformation involving the free real 
variables x11; X12 X35, X2 X22 X23. Evaluate the Jacobian in this linear transformation. 


Yu =X tX2> Yn=XntXm Yng =X + X3 


Yn =X X> Yo7Xmp-X» yn-7Xs-Xa. 
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Solution 3.3.4. Writing in matrix notation 


Yun Yn a 
Yn Yn Yz 


e X12 2 a-f; d 
Xa Xn XB 1 -1 


Y = AX, =| 


Since |A| = -2 + 0 it is a one-to-one transformation. Then from property (ii) above the 
Jacobian is 


1af 


=(p=- 
Pei z(-2) =-8. 


J= 4r = 


Example 3.3.5. Consider the linear transformation involving the free real variables 


Xip X12 X13, X21» X22 X23. 


Yun =Xu +X + X12 t X5 + Xi tX Yi =-(X13 + X23); 
Yi = 3085 + X22) + 2X13 + X53), 
Yn =X Xn + X qq — Xn +X -X3 Yn =- - X53); 


23 = 3082 — X2) + 2083 - X33). 
Evaluate the Jacobian in this linear transformation. Is the transformation one-to-one? 


Solution 3.3.5. Writing the transformation in matrix notation we have 


rod 0 0 
Y -AXB, a=|; jl B-|1 0 3); 
-1 2 


x= [e X12 =| ds Vi A 
Xa Xn XZ Yn Yo Y23 


Here |A| = -2 + O, |B| = 3 + 0. Hence the transformation is one-to-one. From property 
(iv) the Jacobian is given by 


J = |Al IBI" = (-2)3(3)? = -72. 


Before concluding this subsection let us consider a nonlinear transformation. 


(d) Jacobian in a nonlinear transformation 


Let X = (xj) with xj; > 0,35» > 0, XX» -xí > 0 bea 2x2 symmetric matrix and T a 2x2 
lower triangular matrix with positive diagonal elements. Assume that it is possible to 
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express X = TT’. [Not all symmetric matrices can be written in this form. The matrices 
which can be written in this form fall into the category of non-negative definite ma- 
trices. Definiteness of matrices will be discussed in the next chapter in a systematic 
way. When the matrix X is positive definite then the conditions stated above will be 
necessary.] Then our transformation is given by 


ph 2 - k 0 | E 2| 
Xi Xn ty t][0 ty 
-| th, tita | 
= 2 1223 al 
titu Ott» 


Is this transformation one-to-one? Here t2, = x, > tu = * xq. Hence we must have 
tu > O or strictly negative to have t, uniquely defined in terms of x;;’s. Let t; > 0. Then 
tı = vX; is uniquely defined. [Note that when X is real positive definite then all di- 
agonal elements of X must be positive. That is, x; > 0,j- L...,pforapxp matrix.] 
Xp = tuta > ty = J or ty, is uniquely defined. x» = tå + (2, > ty = Xx - tå. There 
are two possible values for t. Hence if t; > 0 and t, > 0 the transformation is one- 
to-one. This can be proved in general also for X a p x p symmetric matrix which can 
be written as TT’ where T is p x p lower triangular with positive diagonal elements. In 
this transformation p(p + 1)/2 elements, t;'s, i >j, in T go to p(p + 1)/2 elements x;,;’s, 
i >j, in X. Let us evaluate the Jacobian in this transformation. Let us look at the 2 x 2 


case, from where the general case will be obvious. We have 


_ 72 z RU 
Xu =t Xp-í(gyb, X»-0 tt». 


Take the x;'s in the order x3,,x5,x5, and the t;’s in the order t, £,,t;; and form the 
matrix of partial derivatives. 


Ox Ox Ox 

rm 72g, E 2 = ti SES 2- 20, 
tu tn t» 

OX _ WX ig 

Oty ^ Ot» 


Xn * * 2t» 


Since the Jacobian matrix is in a triangular form we will not be interested in the ele- 
ments marked by « in the above configuration. The determinant is the product of the 
diagonal elements. In this case it is 27t?,t,). 
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Suppose we have a p x p matrix X, symmetric and positive definite, and a p x p 
lower triangular matrix T such that X = TT’ and tj > 0, j =1,...,p. Then when the 
Jacobian matrix is formed by taking x;;’s in the order x11, X12 ... Xtp» Xo», +++» X2p> +++ »Xpp 
and the ¢,’s in the order fq, tn ..., t5, b22 ++. slp» -++tpp then we have the following 
quantities in the diagonal of the Jacobian matrix. When x, ...,x;, are considered we 
have one 2 and t; appearing p times. When x), ...,X, are considered we have one 2 
and t, appearing p - 1 times, and so on. Hence the final result will be the following: 


(v) If a symmetric matrix X can be written as TT’ where T is lower triangular with 
positive diagonal elements then 


X mH = 


D 
— p ,Dp-1 x: ptl-j 
dX = Ptit t, dT = 2? f t lar. 
J= 


This Jacobian has some very interesting applications, especially in evaluating some 
very complicated integrals. 


Example 3.3.6. Evaluate the following multiple integral 
| j [bara = xP)" e0 dx ^ dx A do 
where x, > 0, X5 > 0, XqX5; — X? > O. 


Solution 3.3.6. Writing in matrix and determinant notations the integral that we want 
to evaluate can be written as follows, observing that, 


XQ, X 
u Xo. 2 
—XuX» — X» 


X2 X» 


Xy X 
X= | i a tr(X) = Xy + X2 
Xp X» 


| IX|^ eo" dx. 
X 


The conditions on x;'s imply that X can be written as TT' where T is lower triangular 
with positive diagonal elements. Consider the transformation 


X = TT' = | tà tuta | ; T- F (0) | ; 
tita ty tty ty ty 


Then 


dX-s2Hb;df,; 4,50, 5,70. 
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Let the integral be denoted by y. Then 
y= | XI et qx 
X 
= [ irre eme, ar. 
Note that 


IX] = ITT'| = tht, 
tr(TT') = ti, + t + 6. 
Then 
rs ati " 
y=? | (t) e idt 
0 
W 1 2 99 2 
x | (55)**? edt, | e idt. 
0 —00 
Observe that t,, and t, are restricted to be positive whereas t is free to vary. We need 
to evaluate only two types of integrals. 
co B 2 
2 | (u?) e" du and | e? dz, 
0 —oo 


for B=a+ 1 a+1. Substituting 


we have 


-1(6+5) for (B+ 5) >0 


where I'(-) is a gamma function and &(-) denotes the real part of (-). [For the sake of 
those students who are unfamiliar with gamma functions a definition will be given 
after the discussion of this example.] 


co 
| e? dz=2 | edz (since e~ is even and the integral exists) 
: 0 
oco Pu 
= | w2le"dw (put z*=w,w>0) 
0 
Qe 
-1(5)- vr. 
Hence the answer is that 


y- var(a + ;Jr« +1). 
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Definition 3.3.2 (A gamma function I'(a)). It can be defined in many ways. Tr(a) is de- 
fined for all a + 0,-1,-2,.... An integral representation of I'(a) is the following: The 
standard notation used is T(z), (gamma z or gamma of z; it is a function of z). 


T(z) = Ie x 1e%dx for R(z) > 0. 
0 


For the convergence of the integral the condition R(z) > 0 is needed. If z is real then 
the condition reduces to z > 0. This condition is needed only if we are using an inte- 
gral representation. Otherwise the condition is z + 0, -1,-2,... A few properties which 
follow from the definition itself are the following: 


T(a) =(a—-1)I(a-1) for R(a-1)>0. (3.3.12) 


This property is evident from the integral representation, by using integration by 
parts. Extending this result we have 


T(a) =(a—-1)(a—-2)...(a—-r)[(a—-r), R(a—-r)>0. (3.3.13) 


I(n)=(n-1)! when nis a positive integer. (3.3.14) 


The next result can be established by considering a double integral. 


(2) = Jm. (3.3115) 


3.3.4 Functions of matrix argument 


Consider a p x p matrix X. We can define several scalar functions on X. For example 


(a) fi (X) 2 |X| = determinant of X 
(b — f,(X) =2|X? -3IX| 4 5 
(c) B(X) = tr(X) = xy, + +Xpp = trace of X 


are all scalar functions of X. We could have also defined matrix functions. For exam- 
ple, 


(a) | gQ)-Uu-x]! 
(B _g,(X) =1+3X + X*-5X3 


are matrix functions of X. We are interested in real-valued scalar functions of matrix 
argument in this subsection here, that is, functions of the types (a), (b), (c) above. 
One of the very basic functions in the theory of scalar functions of matrix argument is 
a matrix-variate gamma, analogous to Definition 3.3.2. Since the algebra can get very 
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involved we will only introduce a matrix-variate gamma and stop the discussion. Con- 
sider the following integral, denoting it by 


(a) -| IX F et a x 
X 


where X is p x p such that it can be expressed in the form X = TT' where T is lower 
triangular with positive diagonal elements. Then making the transformation X - TT' 
we have the Jacobian from property (v). That is, 


r,() - [ xr eax 
X 
p ; 
3l rete apt lar 
T 


j=l 


p xa. pp 
2 í p p+l-j 
-[ (T13) 215 
T \j=1 j=1 


xe tittat +p) dT 


p co ; co 
P fej (By? esas HTT «an 
j1 70 >j "700 
Evaluating with the help of the gamma integral of Definition 3.3.2 we have 
p(p-1) 1 p- 1 
T, 2a p r( -5)r -y«r( -R-) 
pajama + I(a)T|a 5 (a - 1) aS 


for R(a) > po. 


Definition 3.3.3. A real matrix-variate gamma: Notation T, (a) (gamma p alpha), 


- 1 p-1 p-1 
T, (& Eqs "rayr(a- 5) --r(a- — ). R(a) > —— 
pa =n Aa- 5 ; (a) > — 
= | IX "retWax, R(a)> pel 
X=X'>0 2 


Since the integral on the right gives T, (a) if we divide both sides by TI',(a) we can cre- 
ate a matrix-variate statistical density out of this function, known as the real matrix- 
variate gamma density. 


Definition 3.3.4. A real matrix-variate gamma density 
1 
Tp(a) 


where the matrix is p x p symmetric and can be written in the form X - TT'. The no- 
tation U = U' > 0 means the matrix U is symmetric positive definite. This concept of 
definiteness will be introduced properly later on. For the time being take itas meaning 
that X can be expressed in the form X - TT' where T is lower triangular with positive 
diagonal elements. 


f(X)= 


; -1 
IX F et, X =X! > 0 R(a) ? i 
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3.3.5 Partitioned determinants and multiple correlation coefficient 


The idea of partitioning a matrix was introduced in Chapter 2. Let us examine the effect 
of partitioning on determinants. Let an n x n matrix A be partitioned as follows: 


where A,, is r x r thereby Aj is r x (n - r), A>, is (n- r) xr, Aj is (n- r) x(n- r). Let us 
evaluate the determinant of A in terms of the determinants of the submatrices. Recall 
the steps in the actual evaluation of a determinant. We were adding suitable multiples 
of rows (columns) to other rows (columns) to reduce the matrix to a triangular form or 
to a block triangular form. Instead of adding one row (column) at a time we could have 
added suitable multiples of a block of rows to another block of rows. The result would 
have been the same. Suppose that we want to bring a null matrix at the position of A. 
What suitable combinations of the first block of rows, namely (A), A4?) to be added to 
the second block of rows, (A>, A22), so that a null matrix can be produced at the place 
of A»? A suitable multiple is —A,,Aj; times the first block (A,,, 4,5) to be added to the 
second block (45, 45;). The value of the determinant remains the same. [Remember 
to keep the order of multiplication of the matrices involved.] Then 


Ay Ay 
Ay, Ay 


|A| = 


This can be done if A,, is nonsingular. Since the above is a triangular block matrix its 
determinant is the product of the determinants of the diagonal blocks. That is, 


|A| = [Anl |A» -AATA p| for |A,,| #0. (3.3.16) 
From symmetry it follows that 
|A| = IA; |Ag ^ ApAzdAg]] for |A| £0. (3.3.17) 


In (3.3.16) and (3.3.17) the submatrices enter into a cyclic order. If we start with A4, then 
it goes A,, — A245147 and if we start with A» then it goes A» — A5,A1]4,5. A major 
advantage of the formulae (3.3.16) and (3.3.17) is that the orders of the determinants 
on the right are reduced to r and (n - r) both of which will be less than the order n on 
the left when 1 <r < n. Thus the computations are made a little bit easier. For example 
if we have a 16 x 16 determinant the evaluation can be reduced to the evaluation of 
two 8 x 8 determinants, the latter will be considerably easier, but the penalty is that 
one of the matrices involves product and an inverse. 
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Example 3.3.7. Evaluate the following 4 x 4 determinant by partitioning into 2 x 2 
blocks. Repeat the process with a partition where A4; is 1 x 1. 


10 1 -1 

O 1 1 1 
IA| = 

1 1 0 0 

1 -1 1 1 


Solution 3.3.7. Let 


gaje 4o] 4 .[1 09] 4 [e o 
Ast ppc Res Tp Se Dae EP 


Then |A,,| + 0 whereas |A,,| = 0. By using the formula (3.3.16) we have 


|A| = [Anl [Ag - Ay Aq Ao] 
Rz seb [ros Ola. aia oy fp a 
B 1 1 1 A0 1 1 1|[ 


O 1 
Since A; = b, Aid = b, |A4] 2 1. 


A» -AzA Ap = | 


re Oo 
=. o 
—' 
| 
SI 
RP e 
e 
A 
uu 
Œ 
e= 
PR 
is- ud 
A 
— 


-2 


A» - A4411A4| = 
| 22 214*11 12l f 3 


|=-6 = 1al=-6 


When A,, is 1 x 1 we have A, = 1, |A;,| = 1, A] = 1. Again using the formula (3.3.16) 
we have 


1 1 1 
Ap-AxAgAp-|1 0 Of - [1] [0, 1, -1] 
-1 1 1 

1 1 1 0.0 O0 
-|1 O O0Oj-1|O 1 -1 
-1 1 1 O 1 -1 
1 1 1 
=s|1 a 1l; 


3.3 Some practical situations —— 243 


1 1 1 
[A5 - Ay Ay Ap] =l 

-1 0 
1 1 1 

-|o -2 0| 10 € 2: (0* (3 5] 
O0 1 3 
-2 0 

= =-6 > IA] => 
1 3 


Note 3.1. If partitioning technique is used to evaluate a determinant then select the 
submatrices appropriately so that the computations can be minimized. 


We can obtain a very interesting result when A,, or A» is 1x 1. Let the p x p matrix 
= (vi) be partitioned as follows: 


V= [s d 
Va V» 


where V4, = v İs 1x 1 thereby V,, is1x(p-1), V4 is (p-1) x1and V5 is (p- 1) x (p - 1). 
Let v4, #0, |V5,| # O. Then from (3.3.16) and (3.3.17) we have 


|V| = LZ -| (a) 
Vu 
= [Vil [vu - Vi; V7 Va]. (b) 


That is, the scalar quantity, 


a IV | Vy Va Vio 
v4 7 Vo V Va = = V l (3.3.18) 
uT “12%22 Var [Viol Val 2 Va 
But 
|a taha) V ié V3 ate] 
1“ Vu 
Comparing with (a) and (b) above we have 
-1 
- Vo V3 Va = val - In YaYu | (3.3.19) 
Vn 


The beauty of the relationship is that on one side we have a scalar quantity whereas 
on the other side we have a (p - 1) x (p - 1) determinant. This formula is often used 
in statistical and other problems to reduce a (p - 1) x (p - 1) determinant to a scalar 
quantity. Also from (a) and (b) above we have 


Vo Vz Vy _ _ IVI 
Vu vul V22l 


1 (c) 
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VVZ Va _ 1 IVI 
Vu vul Vl 


When V is a variance-covariance matrix associated with a real vector random variable 


then VoVa Va is called the square of the multiple correlation coefficient of x, on 


X= (; ) and it is usually denoted by 


P 


2 _ VVA Va 
PIOC D > = e 
Vu 


E E 
= vy VV Vav . (3.3.20) 


One can show p, to be the maximum correlation between x, and an arbitrary lin- 
ear function of X, ... ,x,. In prediction problems when a variable such as x, is predicted 
by using a linear function of other variables such as x;, ..., x, then p,;.. ,, is often used 
to measure how good is the predictor in the sense, larger the value of pj. jj better the 
predictor. 

If V4 is pj x p,, V» is pj x pj such that p, + p; = p then (3.3.20) is no longer a scalar 
quantity. If we write the last expression in (3.3.20) with V4, a p, x p, matrix, that is, 


- sal 
PSU 35V Va Va? (3.3.21) 


where vi indicates a positive definite square root of Vj, then P in (3.3.21) is known 
as the canonical correlation matrix which plays a vital role in canonical correlation 
analysis. This field is also mainly concerned about prediction problems, predicting a 
set of variables with the help of another set of variables, a generalization of the first 
situation where one scalar variable is predicted by using a set of other variables. These 
areas are very rich in real-life situations where matrices and determinants play very 
important roles. 

Other concepts associated with the concept of multiple correlation and canoni- 
cal correlations are the concepts of partial correlations, correlation ratios and partial 
correlation matrices which come into regression problems, residual analysis, model 
building and other prediction and estimation problems. These quantities can be writ- 
ten up in terms of partitioned matrices and the corresponding determinants. 


Example 3.3.8. Compute the multiple correlation coefficient of x, on (x2, x3) from the 
following variance-covariance matrix of X, where X! = (x,,x5, X3): 


2 1 
V=|1 3 
1 2 


AN 
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Solution 3.3.8. Let v,, = 2. Taking the expression from (c) above 


_ VV Va -1 |V| 


2 
P123 ; 
a Vu vul Val 
3 2 
V»| = = 11; 
IVa E 5 
m. 3 2 1 2 1 3 
M-h 3 2} =2[° al-l ael 3| 
1 2 5 
= (2)(11) - 3 + (-1) = 18. 
Hence 
|V] 18 9 
= = => 
vulval DAD N 
9 2 
2 
=] -— = — 
P103) uu 


3.3.6 Maxima/minima problems 


One of the problems in multivariable calculus is to look for maxima/minima of a func- 
tion of many variables. Let f(X) be a scalar function of the p x 1 vector X of real vari- 
ables. In Chapters 1 and 2 we have defined the differential operators 


t à -( à à ) 
aX | 4 |’ ex! ax ax? 
ox, 
a 2? 
a.d ax? Ut O0XQ0x, 
aX OX! — : sew 3 
Ox 77 ox; 


Then 2 operating on a function f equated to a null vector gives the critical points and 


E a at these critical points will decide on the critical points being corresponding to 


a local maximum or a local minimum or something else. 
Example 3.3.9. Check for maxima/minima in the following function 
f =X? +2xX3 xj — Dy xy 7 XX3 + 2X] + 4% + X4 4 8. 


Solution 3.3.9. Consider 


of 2x, - 2X5 +2 0 
ae —2X, + 4X) -X%3+4]=]0 
-X2 + 2x3 +1 0 
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Let us solve the equations by looking at the coefficient matrix and performing elemen- 
tary operations: 


1 -1 0-4 
20 -1 2/-1 
0 O 3]-8 
= x; 8 Pa B n _16 
3 3 3 
There is only one critical point (x, X2, x3) = (- £, - 2 ,— 2). This point may correspond to 


a local maximum or a local minimum or neither. Consider the matrix of second order 
partial derivatives operating on f. 


T = [2x 2X, +2, 2X, + 4X — X3 + 4, -X2 + 2X3 + 1] 
2 -2 0 
d f Pape 
I 
OX 0X i x 


For a minimum this matrix at the critical point must be positive definite and for a 
maximum it should be negative definite. We will define definiteness of matrices in 
terms of determinants next. Definiteness can also be defined equivalently in terms of 
other quantities. 


Definition 3.3.5 (Definiteness of an n x n real symmetric matrix A). It is defined only 
for non-null square symmetric matrices when real. Consider all the leading minors 
of A. Let the leading minors be denoted by |M,|,...,|M,,|. Then A is positive definite 
if |Mj| > 0, j 2 1,..., n (positive semi-definite if the minors can be zero also); negative 
definite if |M;| < 0, |M;| > 0, |Mj| < 0, ... (negative semi-definite if the minors can be 
zero also); and A is indefinite if some of the minors are negative and some positive, at 
least one each set, and not belonging to the above types. 


For our Example 3.3.9 let us look for the definiteness of our matrix of second or- 
der partial derivatives, evaluated at the critical points. Since our matrix is free of the 
variables the matrix evaluated at the critical point is itself. Let us look at the leading 
or principal minors. 


IMjl =2>0, 
2 


M| = F 
w-[^ 2 
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SE 4 -1 Pr 
M,| = |-2 -1| =2 =(2 0 
di EIN IE | ( Jr 2|" 
0 -1 2 


=(2)(7)-8=6>0. 


Hence the matrix is positive definite. Therefore the critical point corresponds to a min- 
imum. 

Another definition of definiteness of matrices in terms of eigenvalues will be in- 
troduced in the next chapter. Another one in terms of quadratic forms will be given 
next. 


Definition 3.3.6 (Definiteness of an n x n non-null real symmetric matrix A). Con- 
sider a quadratic form u = X'AX, A = A' where X is an n x 1 non-null vector and A 
is the real symmetric matrix under consideration. If u > 0 for all possible non-null X 
then A is positive definite (u > O means positive semidefinite). If u < O for all possible 
non-null X then A is negative definite (u x O means negative semidefinite). If u » O for 
some values of X and u « 0 for some other values of X then A is indefinite. 


Note 3.2. In order to avoid confusion and possible misinterpretation, one should take 
Definition 3.3.6 as the definition for definiteness in the real case and all other proper- 
ties are to be treated as consequences. 


Exercises 3.3 


3.3.1. Evaluate the cross product à x b for the following cases: (1) á = i= j - K, b= 
2i + 3j —k; (2) a=i, b=j; 3) a@=i, b=k. 


3.3.2. Construct a vector parallel to the line of intersection of the planes 
X+yV+Z=2, 2x-3y=z=5. 


3.3.3. Evaluate the volume of the parallelepiped generated by the following points 
with the origin: 


(1) (1,1,-1), (1,2,5), (3,2, -1), 2) (2,1,-1), (41, 2), (3,2, 1). 


3.3.4. Evaluate the volume of (a) the parallelotope, (b) the simplex generated by the 
following points with the origin. 


(1) (55111, (L-L121, (11-1,1,-1), 
(2,1,3,-1,1), (1,0,0, 0, 2) 
(2) (LL11, (LL-L-1, (L-LL-1, (L-1-1-1) 
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3.3.5. Evaluate the volume of (a) the parallelotope, (b) the simplex, determined by 
the points 


(1,2,1,1) (1,3,1,2), (2,1,1,1), (23,410, (1,-10,1) 
3.3.6. Evaluate the Jacobians in the following linear transformations: 


(à  yi72G5-X?,*X5 Y2=X -X + 2X3, 
ys = 2X1 - X) - X3; 
(D — Yyun-2Xu*X*op Yn 72Xp + XQ 
Yn-^Xut33 yn-Xpt3X» 
(c) Yu = Xy tX54 t2Xjp tX» Yn 2X4 + Xy — Xy - Xn» 
Yn = Xy + 3X% +X + 3X2 Yn = Xy + 3X% -Xy — 3X». 
3.3.7. Let X = X' be a p x p symmetric matrix of p(p + 1)/2 real variables. Let E and F 


be two basic elementary matrices of the E and F types (see Chapter 2). Evaluate the 
Jacobians in the following transformations: 


(a) Y=EXE', (b) Y=EXF’. 


3.3.8. Let A be a p x p nonsingular matrix of constants, X a p x p symmetric matrix of 
p(p + 1)/2 real variables. Evaluate the Jacobian in the linear transformation Y = AXA’ 
and show that dY = |A|?*'dX, ignoring the sign. [Hint: A nonsingular matrix of the 
type A is a product of the elementary matrices of E and F types.] 


3.3.9. Evaluate the Jacobian in the following nonlinear transformation. Let x; > 0, j= 
L...,p. Let y4 =X; + t Xy, Vo = XX2 +X1X3 t+ Xp aXp, «s Yp 2X1 Xpe [These are 
the basic elementary symmetric functions]. 


3.3.10. Evaluate the Jacobian in the following generalized polar coordinate transfor- 
mation: 

X; = rsin 6, sin 6, +- sin, ;sinO, , 

X; = rsin 0; sin 0, --- sin @,_, cos, , 


X3 = rsin @, sin 0, ---cosO,_» 


X,_1 = rsin 0; cos 0, 
Xy = r cos 6, 
where 0 «6; x7,j- 1,2, ...,.k -2,0«0, 4 < 27, O <r < oo. 
3.3.11. Evaluate the integral f, e "dX where X is px p, X = X' and X can be written 


as X = TT! where T is lower triangular with positive diagonal elements. k means the 
integral over all such X. 
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3.3.12. Evaluate the integral I e- (@+bX dy a > 0, b + 0 and x is real scalar. 


3.3.13. Let X beap x 1 vector of real scalar variables. Let A be a p x p constant matrix 
such that A can be written as A = BB’ with |B| + O. Evaluate the following integrals: 


() [em tax, i [ecwaermax 
X X 


where p is a constant vector. [Hint: Use the transformation of the type Y = B'X.] 


3.3.14. Let X = £397 isrx1, Xispx1andlet A = BB’, with |B| #0 a p x p matrix of 


constants. Evaluate the integral |. e X ^Xgx,, that is, integrate out the variables in X. 
2 


3.3.15. Multivariate Gaussian density. The most popular density in multivariate sta- 
tistical analysis is the multivariate Gaussian density. Let X be a p x 1 vector of real 
scalar random variables, u a p x 1 vector of constants, V a p x p real symmetric posi- 
tive definite matrix, that is, which can be written as V = BB’, |B| + 0. Then the p-variate 
Gaussian density is given by 


1 1 ya 
= — 5 (X-p)'V™"(X-p) 
f(X)= Omp 2 ; 
for —oo < x; < co, —oo < H; < 00, with X' = (xy, ...,x5), H’ = (Hi ++» Mp), V = (vj). Show 
the following: (a) f (X) is a density, that is to say that f (X) > O for all X and In f (X)dX =1; 
(b) E(X) = p, that is to say that h Xf (X)dX = u; (c) Covariance matrix of X is V, that is 
to say that 


V =E[(X -p)(X -p)'] = [mex -w'raoax. 


3.3.16. Canonical form for a quadratic form. Let u = X’AX be a quadratic form. 
Without loss of generality A = A’. From elementary transformations it was seen in 
Chapter 2 that A can be written as A = QDQ’ where Q is nonsingular and D is diagonal. 
Then if Y = Q'X the quadratic form reduces to its canonical form, a linear function of 
squares of the formu = Ay] +-+- +A,y5 when A is p x p. Reduce the following quadratic 
forms to their canonical forms: 

(a) u =2x? + 3d + 2x2 + 2x4x; — 2x13; 

(b) u =x? 42x06 + 210 + DHX = De. 


3.3.17. Show that u = 1 in Exercise 2.3.16(a) can be reduced to an ellipsoid in the stan- 
dard form. 
3.3.18. Show that for u = 1 in Exercise 2.3.16(b) is not an ellipsoid. 


3.3.19. Write the following bilinear forms in matrix notation as X' AY: 
(a) Uy 2 Xy, + 2oyi + 2X3y1 — Xy? + Xsyo; 
(b) Uy 2xyyi + 2x5y; + 2X3y1 — X3y3 + 2Xy3 + X3y3 — X3y3- 
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3.3.20. Through elementary transformations an m x n matrix A can be written as A = 
PDQ where P and Q are nonsingular matrices and D is a diagonal type matrix. By using 
this property reduce the bilinear forms in (a) and (b) in Exercise 3.3.19 to the form 


Az +++ +ApZpty. 


3.3.21. Show that Definitions 3.3.5 and 3.3.6 are equivalent for real symmetric matri- 
ces. 


3.3.22. By using Definitions 3.3.5 and 3.3.6 show the following: 

(a) If a real square matrix A is positive definite then it can always be written as A = 
BB’, |B| #0. 

(b) If B is a real rectangular matrix m x n then A = BB’ is either positive definite or 
positive semidefinite. 

(c) What should be the condition on B in (b) above so that A is strictly positive defi- 
nite? 

(d) Show that a negative definite or indefinite matrix A cannot be written in the form 
A= CC! for some matrix C. 


3.3.23. Evaluate A”, B?9, C^? and their determinants, where 


-1 
A- |a al B-f? AF ee ; l 
0 0 0 1 0 1 


3.3.24. Let A and B ben x n matrices. If A and B differ only in their j-th column then 
show that 


2'"|A + B| = |A| + |B]. 
120 10 
3.3.25. Let A= [o T «| and B = A". Compute B. 


3.3.26. Let J' = (1,1,...,n) and let A=I-B, B= iy Compute |(AB)?|. 


3.3.27. Let A bea 3x 3 matrix with all principal minors positive. Show that |A| can be 
written as |A| = a?b?c? for some a, b, c. 


3.3.28. By using Definition 3.3.6, or otherwise, show that if A = (aj), A = A',nxn 
positive definite then aj > 0, j = 1,...,n and if A is negative definite then aj < 0, j = 
1, ..., n. Note that these are necessary properties but not sufficient to talk about positive 
definiteness or negative definiteness. 


3.3.29. IfA = A' isnxn then prove that A is indefinite if at least one diagonal elements 
is negative and at least one diagonal element is positive. 
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3.3.30. LetA- (ai) be n xn. Consider the determinant A — xI| where x is a scalar quan- 
tity and J isan n xn identity matrix. That is, x is subtracted from the diagonal elements 
aj, j =1,...,n. Then prove that when |A - xI| is opened up with n! terms using (3.1.2) 
then show that x"! can come only from one term, namely, (a4 - x)(a5; - x) ---( x) 
and there is no other term of degree n - 1 in x out of the n! terms. 


app- 


4 Eigenvalues and eigenvectors 


4.0 Introduction 


Among all the concepts introduced so far none may have as many applications as the 
concept of eigenvalues. Before introducing the possible fields of applications we will 
define the concept, study some of the main properties and then we will have a suf- 
ficient set of properties and tools to tackle practical problems where these are ap- 
plicable. Fields of applications include theoretical developments of many branches 
of mathematics, physics, statistics, econometrics and many other areas, and real-life 
problems. 


Definition 4.0.1 (Eigenvalues). Eigenvalues are defined only for square matrices. Let 
A be an n xn matrix. Consider the equation 
AX -AX (4.0.1) 


where A is a scalar and X is a non-null n x 1 vector. This equation is evidently satisfied 
by a null vector X. If the equation has a solution for a A and for a non-null X then that 
A is called an eigenvalue or characteristic root or latent root of A and the non-null X 
satisfying (4.0.1) for that particular À is called the eigenvector or characteristic vector 
or latent vector corresponding to that eigenvalue A. 


Let us examine the equation a bit more closely. 
AX - ÀX 2 (A-ADX =O. (4.0.2) 


If this homogeneous linear system (A — AI)X = O has to have a non-null solution X 
then A - AI must be singular. If A — AI is nonsingular then its regular inverse exists and 
then multiplying both sides by (A - AI)! we have the only solution as X = O. If A - AI 
is singular then its determinant must be zero. That is, 


[A - AI| =0. (4.0.3) 


The eigenvalues can also be defined as the roots of the determinantal equation in 
(4.0.3). 


Definition 4.0.2 (Eigenvalues of an n x n matrix A). They are the n roots of the deter- 


minantal equation (4.0.3). 


4.1 Eigenvalues of special matrices 


Let us evaluate the eigenvalues of some special matrices. 


@ Open Access. © 2017 Arak M. Mathai, Hans J. Haubold, published by De Gruyter. [(c) EYZITSIENMI| This work is licensed 
under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. 
https://doi.org/10.1515/9783110562507-004 
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Example 4.1.1. Determine the eigenvalues of (1) a diagonal matrix, (2) a triangular 
matrix. 


Solution 4.1.1. Let D = diag(d,,...,d,,) be a diagonal matrix. Consider the equation 


|ID-Al|=0 > ; 
- a ds dod 
This determinantal equation is nothing but 
(d; - dy - 4) --- (d, - ) = 0. 


That is, the n roots are A, = d,,A, = d,,...,A, = d, and these are the eigenvalues of D. 
Now consider a triangular matrix, for example, a lower triangular matrix T. Then 


Eyed 0O nœ 0 
b Ego 0 
IT-AMl=| ? 2, =0. 
nn a bA 


Since the determinant of a triangular matrix (lower or upper) is the product of the 
diagonal elements, the determinantal equation reduces to 


(ty — À) +++ (£44 -4)=0. 


Hence the n eigenvalues in this case are A, = ti... Àn = tumn- 


(i) The eigenvalues of a diagonal matrix are its diagonal elements. 

(ii) The eigenvalues of a triangular (upper or lower) matrix are its diagonal ele- 
ments. 

(iii) The eigenvalues of a scalar matrix with the diagonal elements c each are c re- 
peated n times. The eigenvalues of an identity matrix I, are 1 repeated n times. 


Example 4.1.2. Evaluate the eigenvalues of 


Solution 4.1.2. Consider 


1-A 3 
O0 1-A 


> (1-A4)0-2)20 
> AÀ-1 A-1 


|A-A|=0 2 | 


are the eigenvalues. It follows directly from property (ii) also since A is triangular. 
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(iv) If the eigenvalues of a matrix are 1 repeated n times this does not mean that the 
matrix is an identity matrix. 


Now consider 


=> (2-A)B-A)-2=0. 
That is, 


M-5A4*4420 2 (A-4)0-1)20 
> À-24 A=1 


are the eigenvalues. 


Let us see what happens if we have an idempotent matrix. Idempotent means 
A = A^. Consider the equation 


AX - AX. 
Premultiply both sides by A. Then we have 
A?X = AAX = A(AX) = X. 
But A? = A and then 
AX-AX-XX => (A-1)X =0. 


But by definition X + O and A is a scalar. Hence A — A* = 0 which means the roots are 0 
or 1. 


(v) The eigenvalues of an idempotent matrix are 0’s and 1's. 
Identity matrix is an idempotent matrix with all eigenvalues 1 each. We can have a 
triangular matrix with the diagonal elements 0’s and 1’s which means that for such a 


triangular matrix also the eigenvalues are 0’s and 1’s. 


(vi) If the eigenvalues of a matrix are 0’s and 1’s that does not necessarily mean that 
the matrix is idempotent. 


Example 4.1.3. Evaluate the eigenvalues of the matrix 
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[This is the matrix of the quadratic form (x)*, x = Qa + +++ + x,)/n the average of the n 
quantities x,, ... , x,.] 


Solution 4.1.3. 
1 1 1 
We. om 7 
1^ Wd x 1 
M-Al-02 |" * " |-Q. 
1 1 1 
i 7 ge^ 


Add all the rows to the first row, the value of the determinant remains the same, take 
out (1— A) from the first row and then add co times the first row to all other rows to 
obtain 


1 1 1 
lA- AI] = (1- A) : ps a : -(1- 97A) 1. 
TTE 
Hence the roots are A, = 1, A, = 0 = --- = A,. That is, one root is 1 and all other roots 


are O each. In this example we can show that our matrix A is idempotent by showing 
A? = A also. 


Let us examine the determinantal equation |A - Al| = 0. If A,,...,A,, are then roots 
of this equation then 


|A - AI] = (A, - A05 - A) --- A, - A). (4.1.1) 
The right side of (4.1.1), when opened up, is a polynomial in À of degree n. That is, 


|A - AI| = (-1)"A" + (A + ++ +A, JA" 
+ (-1)""?[sum of products of roots taken two at a time] 
GU. (4.1.2) 


Definition 4.1.1 (The characteristic polynomial of an n x n matrix A). The polyno- 
mial on the right of (4.1.2) or when |A - AI| is written as a polynomial in À then this 
polynomial is called the characteristic polynomial of A. 


Treating (4.1.1) as a polynomial in A and evaluating it at A = O we have the following 
result: 


(vii) The determinant of an n x n matrix A is the product of the eigenvalues of A. 


Some immediate consequences of this property are the following: 
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(viii) If at least one of the eigenvalues of A is zero then |A| = 0 or A is singular, and 
if A is nonsingular then all the eigenvalues of A are nonzeros (may be positive or 
negative but none zero). 


From (4.1.2) we note that the coefficient of (-A)""! in the expansion of |A - AI| is the sum 
of the eigenvalues. We will derive a very important result connecting the sum of the 
eigenvalues to the trace of the matrix. To this end let us evaluate |A — AI]. Let B = (bi) 
beannxn matrix. Then we had seen from Chapter 3 that 


|B| = 2; 2s 2 CP ba I (4.1.3) 
i, i 


That is, when |B| is written as an explicit sum, each term in that sum contains one and 
only one element from each row and each column. Now, look at the determinant 


a, -À a12 Ain 
a a» -À a 
lA - Al| - 21 = 2n 
Any üp ++ Ann —A 


One term in this determinant, when the determinant is written as a sum of the type in 
(4.1.3), is 


(a - A)(a55 - A) +++ (Ann — À). 


This, when opened up gives a term containing A", a term containing A"! and so on. 
The coefficient of (-A)""? here is 


ay ++: + Ann = tr(A). 


What is the nature of any other term in the sum when |A - AI| is written as a sum? 
One element, other than a,, — A has to come from the first row. Let this be the j-th 
element a,;. Then this rules out the presence of aj; - A the term in the j-th column 
containing A. In other words, in all other terms the exponent of A can be only up to A" ?. 
Now equating the coefficient of (-A)""! on both sides of (4.1.2) we see that the trace of 
Ais the sum of the eigenvalues of A also. Thus we have the following important result: 


(ix) For any n x n matrix A = (aj) 
tr(A) = ay, ta =A, t+ +A, 


where À,, ..., A, are the eigenvalues of A. 
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This does not mean that A, = a,,, etc. Thus the determinant of A is the product of the 
eigenvalues and the trace is the sum of the eigenvalues of A. For example, 


Fortheeigenvalues the matrixis itstraceis its determinant is 


1-12 nonsingular 2 -2 
0,1,3 singular 4 0 
1,1,1,1 nonsingular 4 1 
1,-1,1,-1 nonsingular 0 1 
1,0,1,-1 singular 1 0 


Example 4.1.4. Verify the results (vii) and (ix) from Examples 4.1.1 to 4.1.3. 


Solution 4.1.4. For the diagonal and triangular matrices the results are obvious since 
the diagonal elements themselves are the eigenvalues. In Example 4.1.2 the eigenval- 
ues are 1,1, tr(A) 2 14 1-2, |A| 2 1. In B, tr(B) 224 3 = 5, the sum of the eigenvalues 
is 4+1=5 and |B| = (2)(3) - (1)(2) = 4. The product of the eigenvalues is (4)(1) = 4. In 
Example 4.1.3, tr(A) = i Tec I = 1. The sum of the eigenvalues is 1+0+---+0=1. 
Since A is singular |A| = 0. The product of the eigenvalues is (1)(0) --- (0) =0. 


Since A and A' have the same determinant we have 
|A - AI| = |A’ - AI]. 


Hence we have the following result: 
(x) The matrices A and A' have the same eigenvalues. 


What are the eigenvalues of an orthonormal matrix? An orthonormal matrix P is such 
that PP’ = I, P'P =I. Let A bean eigenvalue of P and X the corresponding eigenvector. 
Then 


PX - AX. 
Premultiplying by P’ we have, since P'P =I, IX =X, 
X-AP'X- XX 
since P and P' have the same eigenvalues. This means, (A? - 1)X = O where X + O and 


Ais a scalar. Therefore A = +1. 


(xi) The eigenvalues of an orthonormal matrix are +1. 

(xii) For n x n matrices A and B, |AB| - product of the eigenvalues of A and B. 

(xiii) tr(A + B) = tr(A) + tr(B) = sum of the eigenvalues of A and B, when A + B is 
defined. 

(xiv) The eigenvalues of a null matrix are zeros. 
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(xv) If A is an eigenvalue of A then KA is an eigenvalue of kA where k is a scalar 
quantity. 

(xvi) If A is an eigenvalue of A then 1 + KA is an eigenvalue of I + KA where k is a 
scalar quantity. 

(xvii) If A = QBQ™! then 


|A —AI| = [QBQ! - Al| = |B-Al| 


and therefore A and B have the same eigenvalues. 
(xviii) If A = A’ and if A = PDP’, where P is orthonormal and D is diagonal then the 
eigenvalues of A are the diagonal elements in D. 


Exercises 4.1 


4.1.1. Evaluate the eigenvalues of the following matrices: 


a=; j 
1 2 


4.1.2. Evaluate the eigenvalues of the following matrices: 


1 1 1 O 1 -1 2k od 
A-|1 -1 1], B=]2 0 1], C=]|1 3 OJ, 
-1 0 1 1 1 -1 10 1 
O 1 1 
D=|-1 0 2 
-1 -2 0 


4.1.3. Let U and V be n x 1 non-null, non-orthogonal vectors. Show that at least one 
eigenvalue of UV' is zero and no eigenvalue of U'V is zero. What are the eigenvalues 
in each? 


4.1.4. Show that the eigenvalues of the n x n matrix A, where 


are one zero and n - 1 unities, the eigenvalues of B are one unity and n - 1 zeros and 
the eigenvalues of I - B are of the form 1 — eigenvalues of B. 


4.1.5. Constructa 2x2 matrix with real elements but whose eigenvalues are irrational. 
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4.1.6. Construct a 2x 2 matrix with real elements but whose eigenvalues are complex 
quantities. Can there be only one or an odd number of complex roots for any given 
matrix with complex roots? 


4.1.7. If tr(A) = 0 is A null? If not give two counter examples. 
4.1.8. If AB = O is tr(AB) = 0? If tr(AB) = 0 is AB null? If not give two counter examples. 


4.1.9. If A+ B= I, is the sum of the eigenvalues of A and B equal to n. If not give two 
counter examples. 


4.1.10. If A = PDQ where P and Q are nonsingular and D is diagonal, are the eigen- 
values of A the diagonal elements in D? If not give two counter examples in the 2 x 2 
case. 


4.2 Eigenvectors 


An eigenvector is defined along with an eigenvalue in Section 4.1. Here we will redefine 
it for the sake of completeness. 


4.2.1 Some definitions and examples 


Definition 4.2.1. If A is an eigenvalue of A then any non-null vector X satisfying the 
equation 


AX = AX 
is an eigenvector of A corresponding to the eigenvalue A. 


Some properties are immediate from this definition itself. 


(i) If X, is an eigenvector corresponding to the eigenvalue A, then cX,, c a nonzero 
scalar, is also an eigenvector corresponding to A,. If X, and X, are two eigenvectors 
corresponding to the same eigenvalue A, then c,X, + c;X, is also an eigenvector 
corresponding to A, where c, and c, are nonzero scalars. 

(ii) If A = 0 is an eigenvalue of A then the eigenvector corresponding to À = 0 is a 
vector in the null space of A. There are n -r such linearly independent eigenvectors 
if the rank of Ais r and if A is n xn. 


Example 4.2.1. Compute the eigenvalues and the eigenvectors of the matrix 


HE 
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Solution 4.2.1. The eigenvalues of this matrix are 4 and 1, evaluated in Example 4.1.2. 
Take A, = 1 and consider the equation 


BX =A,X = (B-A,DX =O 
2-1 2 Xx 0 
> = 
1 3-1] |x% 0 
=> X, +2% =0. 


Both rows give the same equation. Since by definition B — A,I is singular we cannot get 
two linearly independent equations here: 


x -2 
X; * 2x5 20 »x-(2-(1). 


One solution is this. Any nonzero scalar multiple of X, is also a solution. There are 
plenty of solutions. Hence one eigenvector corresponding to A, = 1, denoted by X4, is 


is that eigenvector. For A, = 4 


2 2-4 2 ][x] [o 
sno [* s^ lH 


> -2x,42x,20 and x,-x,=0 


X =X. 


For example, for x; = 1 we have x, = 1 and X, = (1) is an eigenvector corresponding to 
A, = 4. A normalized eigenvector corresponding to A, = 4 is 


Let us create a matrix of eigenvectors and see what happens: 


BX,=A, and BX,=A,X, => 
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Putting the two equations together we have, 


A, O 
B(X,,X;) x (X4, X5) la " > 


2 2||-2 1| |-2 1||1 O 
segs p [ope af [1 ajo 4p 
Note that X, is multiplied by A, and X, is multiplied by A, which means that the matrix 


(X4, X5) is postmultiplied by a diagonal matrix then the columns will be multiplied by 
the corresponding diagonal elements. From the above equation we have the following: 


A O0 
Bü, X9 = XX [^ |^ 
1 2 1 2 0 A, 


B= Gs X) [axa 1 if the inverse exists > 


A, 


EJE dep T 


Let us verify this: 


Then 
-2 1||1 O AG) -1 1 
1 0 4 3 1 2 
E 6 6 2 2 E 
313 9 1 3 
It is verified. What we have seen is the following: The eigenvectors in this case are 


linearly independent and if we denote the matrix of eigenvectors by Q, Q = (X,,X;) 
above, which is nonsingular here, then 


BQ - QD, 
D is a diagonal matrix with the eigenvalues being the diagonal elements. Therefore 
B = QDQ +. (4.2.1) 
We will investigate this aspect a little further later on. 


Example 4.2.2. Evaluate the eigenvalues and eigenvectors of the matrix 


«Dj 
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Solution 4.2.2. 

2-À 1 
2 2-A 
= (2-A)?-2=0 
=> M-4A42-0. 


A-An-0 = | |=0 


The roots of this quadratic equation are available as A, = 2 + V2, A, = 2- v2. 


ete 


ja «bx «c0 > x= 
2a 


The eigenvalues are irrational here. In order to compute one eigenvector correspond- 
ing to the eigenvalue A, = 2 + V2 we consider the equation 


0 2-(2 2 1 0 
pies pasos oc tm “jz 
x| [0 2 2-(2+V2)}|%]} [0 
_,[-v2 1 J[x]. [o 
2. sw2|Ilxs]) lof 

Note that if we multiply the first row by — V2 we get the second row. Hence we need to 
consider only any one of the two equations. [The rows have to be dependent because 
the matrix A — ÀI is singular.] Take the first equation: 

-V2x, +X% =0 > x, = V2x,. 


One solution is x, =1, x; = V2 or 


The normalized X, is 


Now consider the equation corresponding to the second eigenvalue A, = 2 - v2. 


7 2-Q- v2) 1 x,]_ [o 
anto = [57 s a js)" lo 


-[2 valbel-[o} 
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As before, the second equation is a multiple of the first equation. Taking the first equa- 
tion we have V2x, + x; = 0 which gives x, = - V2x;. For example x, = 1, x; = - V2isa 
solution: 


1 
X= Ge The normalized X, is Y, = ( iE : 
v3 


The matrix of eigenvectors X, and X, is then 


1 1 
ION (5 An) 


Then we have the equation 


A, O0 1 1 ]2442 0 
40-04 ME EI 0 sal 


This means that A, in this case, can be written as 


A-Qpq?- | | t QUE a 


v2 -v2}| 0 2-v2}| v2 -v2 
fi d pe«2- Uo TE. 5 
"s zoe 0 A f 


Example 4.2.3. Evaluate the eigenvalues and eigenvectors of the matrix 


A= l E 
1 2 
Solution 4.2.3. Consider the equation 
A -Àl| 20 
| |=0 > 3-4 
2 (1-A)(2-A)+2=0 
> M-3A4«4-0. 


is -2 


The roots are 


3+ 3? - A(4 
A= 3 isa, i= N-1. 


Both the roots are complex here. [Complex roots and irrational roots can only come in 
pairs.] Let 


A,==+i—, A= i 


3 .W7 3 .N7 
2 
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The eigenvector corresponding to A, is available by solving 


[A-AJ]X 20 = 


ae s 
1 2sG «iXy| bo] LO 


F5 lel 
1 V7 g ` 
1 5 I> X5 0 


Itis not obvious whether the second equation is a scalar multiple ofthe first equa- 
tion or not. Let us multiply the first equation by -iü - ivV7). Then we get [1, 5 -i w), 
(Exercise for the student.) Consider the second equation 


1 v7 
x * (5-15) 0. 
2 2 


For example, for x; = 1,x, = -i +i a One eigenvector corresponding to the eigenvalue 


A -3.iYX is then 


Now consider A, = 3 - iX and the equation 


[A - AI]X =0. 


Proceeding as above one eigenvector is easily seen to be 


_1_jx7 
X= ae 2 . 


Can we have a diagonalization of A by using Q = (X4, X2)? Here 


-la-iyp -lüiv7 
a= (4,4) =| 5( x ES : ir 
eu e 
 dW£|A -5(1-iv7)]’ 
aepo 0 l 
0 3G - iv7) 


It is easily seen by straight multiplication (the multiplication is left to the student) 
that, in fact, 


QDQ"! =A. (4.2.2) 


266 —— 4 Eigenvalues and eigenvectors 


(iii) Even if all the elements of a matrix A are real the eigenvalues could be real, 
rational, irrational or complex quantities. If the elements of A are real and if an 
eigenvalue is real then the corresponding eigenvector is real and if an eigenvalue is 
complex the corresponding eigenvector is complex. 

(iv) Q in the representation of A in (4.2.2) is not unique. Multiply Q by a scalar k #0 
then Q^! produces i and A remains the same whereas Q is changed. 


Example 4.2.4. Evaluate the eigenvalues and the eigenvectors of the matrix 
2 1 
A= , 
0 2 


Solution 4.2.4. Since A is triangular with the diagonal elements 2 and 2 the eigenval- 
ues are A, = 2, A, = 2. An eigenvector corresponding to A, = 2 is available from 


(A-A,DX=0 > 
Po allal- lo o] [x] Lo 
0 2-2]|lx 0 o oji% 0 
Thus a general solution is x, = a, x; = 0, a + 0. Since the rank of the matrix (0 1) is 1 
its null space has only one linearly independent vector or the space consists of scalar 
multiples of (1). We cannot find two linearly independent eigenvectors corresponding 
to the two roots A, = 2, A, = 2. Thus if Q denotes the matrix of eigenvectors then Q is 
singular. Therefore this matrix A does not admit a decomposition A = QDQ"! where D 
is the diagonal matrix with the diagonal elements being the eigenvalues of A and Q 
is the matrix of eigenvectors. We will see later that Q is singular here not because all 


the eigenvalues are equal or repeated but because of the special nature of the matrix 
involved. 


Example 4.2.5. Construct a 2 x 2 matrix B whose eigenvalues are A, = 2, A, = 2 and 
which can be written as HDH ! for some H, |H| + 0, D = diag(2,2). 


Solution 4.2.5. Take any 2x 2 nonsingular matrix H and consider HDH™!. Since D is 
2 times an identity matrix 


HDH” = 2IHH ! = 2I = D. 


When D is a scalar matrix such as the one here, D = cI, c = 2 here, then any n-vector, 
n - 2 here, is an eigenvector and n such linearly independent vectors can be con- 
structed and H consists of the eigenvectors. If a given n x n matrix has eigenvalues 
À, repeated n times can it be written in the form QDQ"! where D = diag(A,,...,A,). In 
some cases it is possible and in some cases it is not possible. 
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If a matrix has distinct eigenvalues, may be one of them is zero, what can we say 
about the corresponding eigenvectors? Are they linearly independent or dependent? 
Let A, and A), A, # A, be two distinct eigenvalues of a matrix A and let X, and X, be two 
eigenvectors corresponding to these eigenvalues. If X, and X, are linearly dependent 
then there exists a non-null vector (c4, c5) such that c, X, + c;X, = O. 


AX, =A,X,; = Ac,X, - Ac X, 
AX, = ÀX, > ACX; = ÀX 
=> A(c«X, + &X3) = (0X, + À;c5X)). 


That is, 
AC,X; + À5cX; = 0 (a) 
as well as c,X, + cX, = O by assumption. From this 
C1À5X + GAX; = O. (b) 
Substituting (b) in (a) we get 
(A, - A)e4X, =O = (4- À5)e, = 0. 


But A, +42. Then c; = 0. Similarly c; = 0 which means that X, and X, are linearly inde- 
pendent by the definition of linear independence. Note that the above proof does not 
depend on whether A, and A, are real, including one of them zero, or in the complex 
field and the elements of the eigenvectors could be real or in the complex field. The 
above method is applicable in the general situation. But if we are only concerned about 
just two eigenvalues then if the eigenvectors are dependent, one is a scalar multiple 
of the other. Then the result follows in two steps by using this property. 


(v) If the eigenvalues of an n x n matrix are all distinct, some may be in the complex 
field, one may be zero also, then there are n linearly independent eigenvectors. Fur- 
ther, eigenvectors corresponding to distinct eigenvalues are linearly independent. 


Thus we are guaranteed that the matrix of eigenvectors will be nonsingular if all the 
eigenvalues are distinct. This does not mean that if some eigenvalues are repeated 
then the matrix of eigenvectors is singular. Example 4.2.3 gives a counter example. 
Hence the situation is that in some cases when some eigenvalues are repeated the 
matrix of eigenvectors can become singular. 


4.2.2 Eigenvalues of powers of a matrix 


Let X, be an eigenvector corresponding to the eigenvalue A, of a matrix A. Then 


AX, - As. 
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Premultiplying by A we have 
A?X, 2 AAX, = A2X 


which means that 4? is an eigenvalue of A? with the same eigenvector X,. Extending 
this result we have the following: 


(vi) If A is an eigenvalue of A with the eigenvector X then A* is an eigenvalue of A* 
with the same eigenvector X of A, for k = 0,1,2, .... 


If A! exists what about the eigenvalues of A ! in terms of the eigenvalues of A? Let A, 
be an eigenvalue of A and X, an eigenvector corresponding to A,. [If A” exists then all 
eigenvalues are nonzero.] 


AX, = A,X}. 
If A“! exists then premultiply by A^! to obtain 
1 
A AX, - AX, > AX, = i^ 
1 
which means that A is an eigenvalue of A !. Extending this result we have the follow- 
ing: 


(vii) If all eigenvalues A,,...,A,, of A are nonzero then A^* has the eigenvalues 
A15, ..., A35, for k = 0,1,2,... with the same eigenvectors as those of A. 
(viii) If the matrix of eigenvectors Q is nonsingular then we have 


Ak = AA --- A = QDQ-! QDQ"! -.. QDQ"! 
-QD'Qi, D=diag(A,,...,A,) 


which also means 
On: AK Q 2 DK 


where A,,...,A,, are the eigenvalues of A, for k = 0,1,2, ..., and if A is nonsingular 
then for k = 0,-1,-2,... also. 

(ix) A being nonsingular means no eigenvalue of A is zero, Q being nonsingular 
means there is a set of n linearly independent eigenvectors for the n x n matrix A. 


Definition 4.2.2 (Definiteness of real symmetric matrices). Definiteness is defined 
only for symmetric matrices, when real, and Hermitian matrices, when in the com- 
plex domain. If all the eigenvalues of an n x n real symmetric matrix A are strictly 
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positive (eigenvalues of a real symmetric as well as Hermitian matrix are always real) 
then A is called positive definite, (if the eigenvalues are > 0 then A is positive semidefi- 
nite), strictly negative then A is negative definite (if < O then negative semidefinite) and 
if some eigenvalues are negative and some positive, at least one in each set, then the 
matrix A is called indefinite. 


For example, we have the following situations when the matrix is real symmetric: 


Eigenvalues the matrix is (singularity) the matrix is (definiteness) 


115 nonsingular positive definite 
0,1,4 singular positive semidefinite 
-1,-3, -8 nonsingular negative definite 
0, -3, -1 singular negative semidefinite 
1,5,-2 nonsingular indefinite 
0,1, -3 singular indefinite 


4.2.3 Eigenvalues and eigenvectors of real symmetric matrices 


Let A = A', a real symmetric n x n matrix. Let A, and A, be two eigenvalues of A and X, 
and X, the corresponding eigenvectors. Then 


AX, - AX, > XSAX, =A,X3X, (a) 
AX, =A,X, > X! AX, = AÀX1X5. (b) 


But when A = A’ we have 
(XiAX,)' - X[A'X, = X} AX. 


Hence the left sides of (a) and (b) are equal, because both quantities are 1 x 1 matrices 
and one is the transpose of the other. Similarly X1 X; = X7 X. Then from (a) and (b) we 
have 


(A, = À3X1X; = 0. 


This can happen when either A, = A, or Xi X; = 0 or both hold. If A, and A, are distinct 
then X/X, = 0 which means that the eigenvectors are orthogonal to each other. 


(x) When the matrix A is real symmetric the eigenvectors corresponding to distinct 
eigenvalues are orthogonal to each other. 


This leads to some very interesting results. A scalar multiple of an eigenvector is also 
an eigenvector. If A is symmetric and if we have n linearly independent eigenvectors 
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then the matrix of eigenvectors can be made into an orthonormal matrix, say P. For 
orthonormal matrices we know that the transposes are the inverses. Then we have a 
representation for A =A’. 


A-PDP', A-A', PP'=I, P'P-I 


where D is a diagonal matrix with the diagonal elements being the eigenvalues of A, 
and P is the matrix of normalized eigenvectors. 


Example 4.2.6. Compute the eigenvalues and the eigenvectors of 


pm i; | 
2 4 
Solution 4.2.6. 
a-Ai-o = [;^ tn? 
2 4-À 
=> (1-A\(4-A)-4=0 
=> A(A-5)=0. 


Therefore A, = 0, A, = 5 are the eigenvalues of A. [Since the eigenvalues are positive or 
zero this symmetric matrix A is positive semi-definite.] Let us compute the eigenvec- 


tors. For A, = 0, 
1 2] |x 0 
A-AIDX- E 
n o> |; as H 


is the normalized eigenvector corresponding to A, = 0. For A; =5, 


H -4 2j||xu| |0O 
sno -[? EE 


xd 
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is an eigenvector. A normalized eigenvector is 


Can there be different normalized eigenvectors for the eigenvalue A, = 5? The answer 
is in the affirmative: 


is another one, normalized as well as orthogonal to Y;. Then the matrix of normalized 
eigenvectors, denoted by P, is given by 


Ses He 
P-| i F|. 
vB OS 


Note the following properties for P. The length of each row vector is 1. The length of 
each column vector is 1. The row (column) vectors are orthogonal to each other. That 
is, P is an orthonormal matrix. Then we have 


XP=PA > 
2 1 2 1 
Pelee ele 
2 4 qe. © aye UE. ME 0 5 
where A is the diagonal matrix of eigenvalues. Since P' is the inverse of P when P is 
orthonormal we have 


A=PAP'. (4.2.3) 
This is a very important representation for symmetric matrices A, in general. 


Example 4.2.7. Compute the eigenvalues and eigenvectors of 


1 0 -1 
A=|O 3 0 
-1 0 1 


Solution 4.2.7. Consider the equation 


1-à 0 1 
lA-M|-02.|0 3-A 0 |-0 
-1 0 1-A 


> (3-A)(-A)(2-A) =0. 
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Therefore A, = 0, A, = 2, A; = 3 are the eigenvalues. [The matrix A is positive semidefi- 
nite.] Let us compute the eigenvectors. For A, = 0, 


1.0 -1/|»x 0 
(A-ADX-02|0 3 0/|!/x»|-2/|0 
0 


-1 0 1l|[x 


1 
= X,=]0 


ra 


is one such eigenvector. The corresponding normalized vector is 


zb 
v2 
Y,=| 0 
als 
v2 
For A, =2, 
-1 0 -1] |x, 0 
(A-A2DX202 |o 1 0/||x»/21|0 
-1 0 -1] [x 0 
1 
: V 
> X,=|0 or Y;-| 0 
24 zL 
v2 
is a solution. For A; = 3, 
-2 0 -1||»x 0 
(A-A,DX=O>]0 0 O]]x]=]0 
-1 0 -2) bx 0 
0 0 
(»X,-|]1| > Y3=]1 
0 0 


Note that the matrix (A — A,J), in each case i = 1, 2,3, is singular with rank 2. Hence in 
the class of linearly independent vectors X there can be only one, 3-2 = 1, vector fora 
given eigenvalue. [This is a general property when the eigenvalues are distinct for any 
matrix, need not be symmetric.] Let us consider the matrix of normalized eigenvectors: 


1 1 p 
42 8 
P=] 0 O 1], PP'-I P'P-I. 
1 _1 pg 
v2 v2 
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We have the representation 
A - PAP' 


where 


the diagonal matrix of eigenvalues of A. 


We may also observe one interesting property from the representation A = PDP’. 
We can always write 


ù 0 0 0 
"IE 0 
0 0 A, 
A, 0 0 
(40 0 0 
0 0 
0 0 0 
0 0 0 
Tcr : 
0 0 À, 


=A, +A t:e t Ap 


That is, A is written as a sum of diagonal matrices of which the j-th one, namely A,, 
has A, as the j-th diagonal element and all other elements zeros for j = 1,..., n. Then 


PAP! =PA,P! & PAP! +-+ PAP. 


Note that P postmultiplied by A; gives the j-th column of P multiplied by A; and all 
other columns multiplied by zeros and when this is postmultiplied by P' we get only 
the transpose of the j-th column and all other elements zeros. Thus if P,, ..., P, denote 
the columns of P then we have the following result: 


(xi) A - PAP! = APP! +++ +A,P,Ph,. (4.2.4) 
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In Example 4.2.7 if we write by using this representation then we have 


1 
1 0 -1 X 1 1 
0 3 0O07;=0] 0 ETE 
-1 0 1 ijtv2 v2 
v2 
d. 
: 1 1 
+2] 0 [0-4] +3 1 | [0,1,0] 
1 2 2 
V2 
5 0 -j 000 
=0+2|0 0 0/|-*3|0 1 O 
1 1 
-3 0 å 000 
Both sides are equal. In the general case when we have the representation, 
A-QAQ'! 


where A need not be symmetric, note that the columns of Q ! or the rows of Q^! are not 
directly available from the corresponding rows or columns of Q as in the orthonormal 
case. Hence, first Q^! has to be evaluated. Then look at the rows of Q^!. Let the columns 


of Q be Q,,..., Q, and the rows of Q ! be R,,..., R,. Then the representation will be the 
following: 


(xii) A-AÀAQR,-*---AÀQ,R,. (4.2.5) 


Let us verify this for the A in Example 4.2.1. There the various quantities are already 
evaluated. 


Then, according to the notation in (4.2.5) 


2 1 
a-[7] q- 1. R - |-i.1]. R - [3,4], 
A, 21, À = 4. Then 
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The result is verified. 


Example 4.2.8. Evaluate the eigenvalues and the eigenvectors of A and represent, if 
possible, A in the form A = PAP’ where 


1 1 1 
Aas 1 1 
1 1 


RP PR 


Solution 4.2.8. The eigenvalues of such an n x n matrix are already evaluated in Ex- 
ample 4.1.3. The eigenvalues are 1 and the rest zeros. In our case the eigenvalues are 
A, 21, À = 0, A; = 0. The eigenvectors corresponding to A, = 1 are given by 


2 1 1 
3 3 3]|pu 0 
(-ADX-02|i -3 i|! x|-|0 
LN Pe 0 
3 3 3 
1 
= X-|1 
1 
is an eigenvector. The normalized X, is 
d. 
Y 
E ME 
1 
v3 
Now consider 
pEr tfa 0 
(A-A coc 11 1|[|»|-21|0 
Ltd ilb 0 


We can obtain 2 linearly independent X since the rank of A is 1. For example 


1 1 
X,=| 0 and X,-| -2 
-1 1 


are two such vectors. The normalized vectors corresponding to these are 


1 d. 

V V6 

Y,=| 0 and Y3= x 
1 1 
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Hence the matrix of normalized eigenvectors is 


1 
LN M. 
P=|5 0 -2 
Erat dE 
S NW We 


It is evident that P is an orthonormal matrix, PP’ =I, P'P =I. Hence 
1.00 
A-PAP', A=|0 0 O 
0.0 0 


Can we write this A in the form (4.2.4)? 


A-AJB,P| + APP) + A3P3P4 


|+0+0 


WIR wie we 
WIR wie Wie 
WIR WIR Whe 


The result is verified. Here in our example two eigenvalues were equal but still we 
could get 3 eigenvectors which were orthonormal. This, in fact, is a general result for 
symmetric matrices. It need not hold for nonsymmetric matrices. In nonsymmetric 
cases in some situations it is possible to obtain a complete set of orthonormal vectors 
and some cases it is not possible. In Chapter 2 it was illustrated that any symmetric 
matrix A can always be written in the form 


A=QDQ' 


through elementary operations on the left and on the right, where D is diagonal and Q 
is nonsingular. Now the question remains: can we select a D and a Q for any given sym- 
metric matrix A such that D contains all the eigenvalues of A and Q is orthonormal? 
The answer to this is in the affirmative. We will establish this general result after intro- 
ducing some aspects of complex numbers, and matrices whose elements are complex 
numbers, in the next section. 


Before concluding this section observe the following points: Computations of 
eigenvalues and eigenvectors are, in general, difficult problems. Even for a 3 x 3 ma- 
trix the characteristic equation, |A — AI| = 0, is a cubic equation. Often one may have 
to use a computer to obtain the roots or use the complicated formula for the three 
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roots of a cubic equation. When the degree of the characteristic polynomial is large 
the only way to solve may be to use a computer. Also the characteristic equation can 
produce irrational or complex roots. 


Exercises 4.2 


4.2.1. Compute the eigenvalues and eigenvectors of A, and if possible, represent A in 
the form A = QDQ"!, where 


(a) a-l al (b) a-f ch (c) 4-|^ 3! 
1 
2 


1 0 0 -1 0 
0 1 
(d) a-lo T (e) A= 


4.2.2. Repeat Exercise 4.2.1 for 


13 1 1 0 1 2 1 
(a) isl k (D A=]1 -1 2|, () A212 4 2 
3 0 4 3.6 3 
4.2.3. Repeat Exercise 4.2.1 for 
11 11 1 1 1 1 
1/1 1 1 1 1 1 1 1 
(a) A-- , (b) A= ; 
4|1 1 1 1 1 1 1 1 
1 1 1 1 1 1 1 1 
1 1 1 1 
1 -1 1 -1 
A= 
(0) 1 -1 -1 1 
1 1 -1 -1 


4.2.4. Construct a 3 x 3 matrix whose eigenvalues are A, = 1, A, = 2, A; = 3 and whose 
eigenvectors are 


Is this matrix unique or can you find one more matrix with the same eigenvalues and 
eigenvectors? 


4.2.5. Find two different matrices Q} and Q, such that the matrix in Exercise 4.2.3 (b) 
can be written in the form 


A = QAQ" = QAQ;! 


where A is the diagonal matrix with the eigenvalues of A as the diagonal elements. 
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4.2.6. Evaluate the eigenvalues and eigenvectors of the following matrices. Let Q de- 
note the matrix of eigenvectors for each case. Is Q nonsingular? 


2 0 0 12 3 0 0 0 
A-|02 O|, B=|0 4 -1], C=]2 0 O 
00 2 0.0 0 1 -1 1 


4.2.7. From the results obtained in Exercise 4.2.3 compute the determinant of A with- 
out any additional computation. 


4.2.8. Compute the eigenvalues and eigenvectors of A? for each case of (1) Exer- 
cise 4.2.1, (2) Exercise 4.2.2. List the cases where A^? is defined. 


4.2.9. Compute A for each case of A in Exercise 4.2.1. 


4.2.10. Let A = (aj) and B= (by) be nxn matrices. By straight multiplication and then 
taking the traces show that 


(a) tr(AB) = tr(BA), (b) AB-BA#I. 


4.2.11. Ifa matrix A has eigenvalues 1, -1, 2is the matrix Q of eigenvectors (a) singular? 
(b) orthonormal? 


4.2.12. Repeat Exercise 4.2.11 if the eigenvalues are 
(a) 1,1,2, (b) 1,0,0. 


4.2.13. Let 


Compute (a) tr(A%), (b) |A~°| (the determinant of A779). 


4.2.14. If a matrix A can be written as A = QDQ"! show that 
tr(AI9) = dIo +... + o 
where d}, ..., d, are the diagonal elements in D. 


4.2.15. If Aisa3x3symmetric matrix with the eigenvalues 0,1,2 construct 4 different 
matrices B such that B? - A. [Here B is a square root of A.] 


4.2.16. Forthe problem in Exercise 4.2.3 (a) represent A in the form of equation (4.2.4). 
4.2.17. Represent A in the form of equation (4.2.5) where 


1 1 0 
A=|1 -2 2 
3 4 


4.2 Eigenvectors — 279 


4.2.18. Evaluate the eigenvalues of the following matrices: 


a b .. b 


A- D where A isnxn 


PrPrPrPrPrP OOO ©}... 
PRP PrP PrP OO 0 
PRP PrP PrP OO 0 
PRP PrP OO Orr Rr 
PrPRODOORHRY, Av 
eene OO OR rR FR 
OOO rFR RFP FP FP RF ke 
oO OF FP RP FP Pe 


1 
1 
1 
1 
1 
1 
(0) 
0 
0 


4.2.19. Companion matrix. Show that the n x n companion matrix A has the charac- 
teristic polynomial 


n-1 
P(A) =A" + M ct 
i=0 


where 
0 1 0 0 
0 0 1 0 
A= ; 
0 0 0 1 
—Co -€ -C .. -C44 


4.2.20. Let A bean n x n matrix with the characteristic polynomial P(A) = Y, cl. 
Show that the scalar c,, 0 < r < n, is equal to the sum of all principal minors of order 
n -r of A multiplied by (-1)"". 


4.2.21. For then x n real matrix A let A" — a,A* 1 + aA"? — ... + a, = O be the charac- 
teristic polynomial. Then show that 


A" - a,A 1 +... +a,I 0. 


4.2.22. Fora nonsingular 2 x 2 matrix A let A? - a4A« a; = O be the characteristic equa- 
tion. Show that 


aA! - a,I - A. 
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4.3 Some properties of complex numbers and matrices in 
the complex fields 


In all the discussions so far the elements in our vectors and matrices were real num- 
bers or real quantities. Now we will extend our discussion to complex numbers and 
matrices with the elements in the complex field. 


4.3.1 Complex numbers 
The very basic quantity in the field of complex numbers is denoted by i which is the 


positive square root of -1, that is, i= V-1. It can arise, for example, when we try to 
solve the equation 


x +1=0 > X =-1 > x=+V-1 or x=Hi. 


A complex number can be written as z = a + ib where a and b are real numbers and 
i= V-1. Then we say that a is the real part of z, written as a = R(z) and b is the imagi- 
nary part of z, written as b = J(z). For example, 


complexnumber realpart imaginary part 


243i 2 3 
2 -3i 2 -3 
3i 0 3 
2 2 0 


(i) A complex number with the imaginary part zero is a real number. A complex 
number with the real part zero is a purely imaginary number. 


Definition 4.3.1 (A complex number). Any number of the form a + ib where a and b 
are real and i = /-1is called a complex number. 


Definition 4.3.2 (Complex conjugate). The complex conjugate of a + ib is defined as 
a — ib, (with i replaced by —i) so that 


(a+ib)(a - ib) = à? + b? 
is always real. 


Notation 4.3.1. The complex conjugate of z is usually denoted by z^ or Z. Since Z is 
already used for the average we will denote the conjugate of z by z^. 
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Definition 4.3.3 (The absolute value of a complex number). If z = a +ib is a complex 
number its absolute value, denoted by |z|, is the positive square root of z multiplied 
by its conjugate z^. That is, 


lz| = Vzz^ = V(a + ib)(a - ib) = Va? + b? = |a + ib| = |a — ib]. 
For example, 


complex number its conjugate its absolute value 


3- 4i 3+4i — r«cat-s 
1+2i 1- 2i «a? + @? = vs 
7i -7i eX? «C? -7 
-2 =) «C2? + (0? =2 


This definition of the absolute value is in agreement with the concept of the absolute 
value of a real number. Also real numbers can be taken as particular cases of complex 
numbers where the imaginary parts are zeros. 


4.3.2 Geometry of complex numbers 


Since any complex number is of the form z = x + iy where x and y are real and i = 
v-1 the pair of real quantities (x, y) uniquely determine the complex number z. If we 
take a rectangular coordinate system and call the x-axis the real axis, and the y-axis 
the imaginary axis then x + iy is the point (x, y) in this complex plane. Some complex 
numbers are marked in Figure 4.3.1. 


Figure 4.3.1: A complex plane. 


Let us take an arbitrary point z = x + iy. Call the origin of the above rectangular 
coordinate system O and let P = (x, y) be the point z in this complex plane. Let 0 be the 
angle OP makes with the x-axis and r the length of OP. Then as shown in Figure 4.3.2 


x-rcos0, yc-rsinO, z-rcosO-irsinO. 
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Figure 4.3.2: Geometry of complex numbers. 


What about powers of z, such as 2°, Z?, ... or z* for some k, positive or negative. Trying 
to take the powers directly as 


zk - (rcos0 +irsin6)* = r*(cos0 + isin )* 


is not an easy process. But one thing is certain. When (cos 0 + isin)* is expanded it 
can give terms containing i, i^, i’, ..., i*. These powers of i can be reduced by using the 
results i? = -1, i? = -i, i* 2 1, P = i and so on. Thus evidently, (cos 0 + isin 0)* gives rise 
to a quantity of the form a + ib where a and b are real. Hence all powers of z* are also 
complex numbers. In order to evaluate the powers in a much simpler way we will look 


at another representation of a complex number. To this end, consider the expansion 


(i0)? (0) 
+ —— T n 


e? =1 + (i0) + 


2! 3! 
2 pg 3 95 
= (6a ees ule eae. 4 
21 4! 3! b! 
These two seres (series) are known from calculus or trigonometry as 
2 94 
cosĝ=1- — + a + 
2! 4! 
and 
3 95 
sin0-0- — + A 
3! b! 
Hence 
e? = cos 0 & isinü > (4.3.1) 
z 2 r(cos0 « isin0) = re??, (4.3.2) 


This is a very important formula to deal with complex numbers. For example, 
z? = [ret]? = re = 7?[cos 20 + isin20]; 


z= [ret]! = le] = "[cos(-6) * isin(-6)] 
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licos0-isin0], rz 


r 
2k = [re'&]* = rkei? = "(cos k9 + isin k0], 
k 2 0,1, 


0,1,2,...,-1,-2,..., r#0. (4.3.3) 
Could we have obtained the same result by powers of z = r(cos 0 + isin 0) directly? 
Yes, we could have arrived at the same results but the process would have involved 
invoking many results from trigonometry. For example, 
z? =[r(cos 6+ isin 0)]? = ?[cos? 0 + (isin 0)? + 2i cos @sin 6] 
= r^[cos? 0 - sin? 6 + i(2sin 0 cos 6)| 
= r°[cos 20 + isin 26] 


since cos 20 = cos? 0 - sin? 0 and sin 20 = 2sin@ cos @. What about the exponents if we 
take products and ratios? For example, 


=z, r#0; 


£ pol i Ra a 
273 = (Pe? Se 9 zig 2.51 r40; 
D r 
n 65 (rei) 3 ið 
zizicz-(re")(re")? z re" =z; 
gmgn = gmn 


The rules of multiplication and division can be carried through exactly as in the real 
case. What about the complex conjugates? We have defined the conjugate of z = x + iy 
as z^ = x - iy. The absolute values remain the same. 


l= ye «y? = yee + Cy? - It]. 


Writing x — iy as x + (-i)y we have 


which is also seen from direct multiplication, (x + iy)(x - iy) =x? « y? =r°. 


4.3.3 Algebra of complex numbers 


What about the sums and products of complex numbers? Let 


zj-X,-iy and z,=x,+Iy, 
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(a + ib)? 


(a — ib) Figure 4.3.3: Powers of complex numbers. 


where x4, X2, y, y; real. If (x1, y1) # (5, y2) then we have two complex numbers. A com- 
plex number, its conjugate and powers are shown in Figure 4.3.3. Writing in exponen- 


tial form we have 
z=re4, z =ne 


where 


x92 E gay 
r= x «yt r= Jd «y. 


6, - tani (22), 6, - tan (22), Xj50,. xX% +0. 
x x, 


Z +Z = (X + iyi) + 9 + iy) = 069 + Vz) + 1% +2) 
=r,e + re 
=r, [cos 0, + isin 0,] + r;[cos 0; + isin 0] 
= (r, cos0, + r cos0,) + i(r, sin 0, + r sin 0)). (4.3.4) 
All these are equivalent representations for z} + z;. Thus z, + z, is again a complex 
number. 


ZZ, = [x + iy] Do + iy;] = 6x? — yy) + Gay; *ox5yi) (4.3.5) 
= (nel) (rze) = y, re 8+) 
7 rjr; [cos(0, + 02)  isin(0, + 0)]. 
zz^ = (x + iy)(x - iy) =x? «y? = (re)(re-9) = r? 
= [r(cos 0 + isin 0)][r(cos0 — isin 0)] = r?[cos? 0 + sin? 6] 
zy (4.3.6) 


What about the conjugate of the product z,z,? 


(zz; = [04X = Yay3) + iz *x5y)]* 
= 0X5 - Yay) - iGay» *X5y1) 
2525 = ba = iyillxo - V2] = 04X2 = yy?) - iGay» xay) 


= (2425)° (Gn -2zi5. (4.3.7) 
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Taking the conjugate of a product or the product of a conjugate give rise to the same 
result. Some numerical examples are the following: 


z2243i 2°=2-3i, |z\=V22+32= VB, 
z= vi3e", tan0- 2 
2 
z;-3i z(--3i |zl- VY « GP - 3, 
cos0-0, sin@=1, @= 5 
z,-3el? 23e 59mm. m=0,1,2,...; 
Z=2, z$-2, |z|-wNQY-«(07-22, cos0-1, sin0-0, 0-0 
z,-2e0*7"0 -2 (in agreement with the rules for real numbers); 
Z3=14+2i, Z,=3+41 > Z; «z, - (1*2) (3 + 4i) 2 4 + 6i; 
23 — Z, = (1+ 2i) - (3+ 4i) = -2 - 2i; 
23 +2; = (1+ 2i) + (3-4i) = 4 - 2i 2 Z5 +23; 
z; - zi = (142i) - (3 - Ai) = -2 + 6i; 
223 * 5z, - 2(1 + 2i) + 5(3 + 4i) = (2 + Ai) + (15 + 20i) = 17 + 24i; 
Z4z, = (1 + 2i)(3 + 4i) = (00) + i7(2)(4)] + i[(1)(4) + 2)()] 
=3- 8+ 10i = -5 + 10i; 
z2 = (142i) = (1? + (28? + 2(1)(2) 21-4 + 4i = -3 + 4i. 


In the real case the square is a non-negative number. In the complex case we cannot 
talk about non-negativity since it is again a complex number. 


2326 = (1+ 2i)(3 - 4i) = [(1)(3) + (2)(-4)i7] + [064 + (26)]i 
=11+ 2i; 
(zyz,Y = (=5 + 10iY = -5 - 10i; 
z5z; = (1- 2i) - 4i) = —5 - 10i = (232,)°; 
Iz,z,l = (5 - 10| = (7522 + (-10)? = VIS = |(z3z,)°|s 
Izzllz4l = |0 + 2D] |G 4| = Ya? + Q2? + 4? 
= V5V25 = V125 = |z32,|. 


This is a general result. If z,,...,x, are complex numbers then the absolute value of 
the product or the absolute value of the product of their conjugates is the product of 
the absolute values. 
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(ii) |2122 °° Zel = 12,1 Zo] -+ zl 


= illii + eel = e zm]. (4.3.8) 


How do we compute the ratios of complex numbers? Since a complex number mul- 
tiplied by its conjugate is a real number we may multiply the complex numbers ap- 
pearing in the denominator by the conjugates to make the denominator real. This is a 
convenient technique that can be employed when evaluating a ratio. Let z, = x, + iy, 
and z, = x, + iy. Then 


Zz Xl O6-*iy)60o-iyj) 
Z; Xj,tiy, (Q6o-*iyj)60o-iy;) 
[X2 + Yay? + il-X1y2 + Xyil 
X3 +y 
_ 4% + y2) m C2 *Xyj) 
= Daye ay ae 
X3 ty) X3 +y? 


As numerical examples we have 


0*3) (3500-0) _ [D -(3)M] + i000) + 62] 


Q-i)  Q-)Q«n 24? 
jr 
=--+-—4 
5 5 


1«3i  (1+3i)(-5i) 3 1; 
5i (5i)(—5i) 5 57 


Instead of —5i we could have kept 5 outside and multiplied both numerator and the 
denominator by -i. 


1+3i_1 3. 
2. 2 2" 
0-3) |  a-30ü0-)Q-i) 


(1+i)(2+i) (1+i)(1-i)(2+i)(2-i) 
| (C2-4)2-i) 4 3, 
(2412241) 5 5° 
(2+i) | | (2+i) = (-)1+ D2+i) 
5i(1-i)(2-i) 5i(1 -i)(2-i) (D1 +:1)(2 +9) 
(2+)-A1+)Q+) 7 1, 
5(124+12)(22+12) 50 50° 


4.3.4 n-th roots of unity 


The square roots of 1 are available by solving the equation x? - 1 => x = +1. If we 
take the fourth root of 1 there must be 4 roots. What are they? x* =1 = x? = +1. Then 
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x?=1 > x=+landx*=-1 => x= +i. Hence the four roots of 1 are 1, -1,i, 5i. What 
about the n-th roots of 1 for any given n? There will be n roots. We must have a gen- 
eral way of computing all the roots. This can be done through the representation of a 
complex number in (4.3.2): 
z - re? =r[cos + sin 0] 
- r[cos(0 + 2mm) + isin(0 + 2mz)] = re9*?nvo 

form = 0,1,2, ... since any multiple of 27 will bring back to the original position. There- 
fore the n roots of z are given by 


where the positive n-th root of r is taken in rn and the remaining part is in the expo- 
nential factor for various values of m. Note that for m = 0,1,...,n — 1 one set of n roots 
are available. Then for m = n, n + 1, ... the same set of roots are repeated. Then writing 


2mm 


12em) a 1: 2e, m-12,..,n. 


But 


! amu .. 2mm 
ei2mn/n) - cos +isin m=1,2,...,n. 
n n 


The n roots of 1 are then available from the above by substituting for various values 
ofm, m =1,2,...,n. What about the n-th root of —1? Note that 


-1 = cos n + isin n =cos(m + 2mm) + isin(gt + 2mm) 
= gi 2m) 


Therefore 


(-1)5 = eQm«Dnn _ cos( ae a + isin( Med a 
n n 


for m=1,...,n. For example, the 4-th roots of 1 are 
2m .. 2m T .. M zt 
cos —7+isin—7, m=1,2,3,4 > cos— « isin —, Cosa « isin;t, 
4 4 2 2 
m — 7 M 
cos(m + z) + isin(7+ T), cos27 + i sin 27 


=> 0+i,-1+(0)i,0-i,1 


> 1,-1,i,-i. 
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What about the square root of -1 from the formula? 


m.. m 
cospm ED +isin(2m+1)7, m=1,2 


3T .. 3m" 5 .. bm 
=> cos a org on ; 


M .. M M aT 
=> cos isin —, cos — +isin 
2 2 2 


> 0-i,0 +i > i-i 


are the roots. What about the n-th roots of any complex number a + ib? Take r = 
va? + b? and 0 such that tan 0 = P for a + 0. If b = 0 then it is a real number. Then take 
it as a multiplied by 1 if a > O or |a| multiplied by —1 if a < 0. Then take the positive 
n-th root of |a| multiplied by all the n roots of 1 or -1 as the case may be. If a = O then 
take it as ib and proceed as above taking the n-th positive root of |b| and all the n roots 
of i or -ias the case may be. 


Example 4.3.1. Compute the 4-th roots of 1 + i. 


Solution 4.3.1. We can write 


. ; 1 
1«i-re(??mm  tan@= : 212 


6-7 «2m, rz V? +1? = v2. 


Hence 


1 m m >. (M_n T ei 7 
28 [cos( 5 + =) +isin( Z + = ),cos(7+ =) +isin(+ x) 
2 16 2 16 16 16 


3m n .. (30 m 7 MC m 
cos( 7. + =) +isin( 7 + = ),cos( 27+ =) + isin(27 + Zy]. 
2 16 2 16 16 16 


That is, 


M MENS LR T TL EETAS | ETI | GAT. 7 
28 | sin — +icos —,- cos isin —,sin icos —, 
16 16 16 16 16 16 


T . n 
COS — - sin — |. 
16 16 
What about the 4-th power of 1+ i? 
1+i= v2el ? = 


(1 i)^ = 4e^ 01/9 = ge" = ATcos zr  isinzr] = —4. 
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Let us verify it by direct multiplication. 


(14i? =14 21+? -2142i-1- 2i, 
(141) = Qi = 4(-1) = -4. 


Thus the representation of a complex number in terms of its length or absolute value 
r, and 6, such that tan 0 = 2: à + O, gives a convenient way of finding the powers or 
roots of any complex or real number. The representations for real as well as purely 
imaginary numbers are the following: 


12e")  m20,,23,.. 
-12emd"  m=0,1,2,... 
i-em)  m=0,1,... 


NEP: 
-j2e(z97m  m=0,1,.... 


4.3.5 Vectors with complex elements 


Let us start with the concept of vectors as n-tuples or as ordered set of n elements. 
If some or all elements are complex what will happen to the various operations with 
the vectors. Let U and V, U' = (u,, ..., Un), V’ = (v,,...,u4), be two n x1 vectors. The 
definition of a scalar multiple of a vector remains the same as in the real case: 


cU' = (cu, ..., Cu) 
where c is a scalar in the real or complex field. Addition remains the same: 
CU' + dV' = (cu, + dv,,..., cu, + dvp) 


where c and d are scalars. The definition of linear independence remains the same. 
Let U}, ..., U, be n-vectors defined in the complex space C". [We had denoted real Eu- 
clidean n-space by R”. The complex n-space will be denoted by C". Since each complex 
variable represents a pair of real variables one can look upon C" as corresponding to 
R" ] Let c}, ..., c, be scalars. If the equation 


c,U,+---+c,U,=0 


holds only when c, = 0=--- = Cg, where O denotes the null vector, then U}, ..., Up are 
linearly independent. Otherwise they are linearly dependent, and when they are lin- 
early dependent then at least one of them can be written as a linear function of the 
others. The scalars c4, ..., c, could be real or complex. 

The definitions of vector subspaces, their bases and dimensions remain the same 
as in the real case. Since the length or absolute value of a complex number is an ex- 
tension of the length in the real case the length of a vector has to be redefined when 
the elements are in the complex field. 
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Definition 4.3.4 (The length of a vector). Let 
2 x, * ly, x yı 
Zz| : ]= : > X=| : h Y=] : 
Zn Xn t lyn Xn Yn 


where x; ...,X4, Y1» ..., Yn are all real and i = V-1. Then the length of Z, denoted by ||Z||, 
is defined as 


IZI = yiz? +--+ zal? = VOR D 0d y2) (4.3.9) 


or ||Z|| satisfies the relation 
(iii) Z|? = IXIP + IYI. (4.3.10) 
For example, 
Z= f | => IZI? = [2] + [(3)? + (-1?] = 14; 


Z=(1+i,2-3i) 2 IZIP = [()? + 1] + [(* + C:3)?] = 15; 
Z =(2,1-i,4i) = |ZI = [QY + (07] + [02 + (-1)?] + [(0)? + (4?] 
-22. 


If X is an n x 1 real vector then the dot product of X with X is 


XX aX X ax? ++ 


m 
Then length of X, denoted by ||X|| is given by 
X' X = |X|P. (4.3.11) 


If this relationship is to be preserved then we should redefine dot product of two vec- 
tors in the complex field slightly differently. Then when the vectors are in the complex 
field the dot product will be different from that in the real case. 


Definition 4.3.5 (The dot product in the complex case). Let U and V be vectors in the 
complex space C". Let U* be the conjugate transpose of U (either take the transpose 
first and then the complex conjugates of every element or take the complex conjugates 
of every element first and then transpose the vector). The dot product of U with V is 
defined as 


U.V = U*V = ujv, * u$v, +- + ue Vs. (4.3.12) 
Then the dot product of V with U will be 


V.U vfu, +- t Viug. 
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Note that U.V need not be equal to V.U but if all the elements of U and V are real 
then U.V = V.U. If U = V then 


U.V = U.U = uju, +++ + usus 
and 

U*U = |u|? + -+ + us = UI? (4.3.13) 
which is consistent with (4.3.11). For example, 


2 1+i 


U*V =2(1 +i) + (1+ i)(2i) + (-3)(1 - 3i) = -9 + i, 
V*U =(1-i)(2) + (-2i)(1- i) + (1 + 3D) 3i) = -9 - i; 


1+2i -2 +i 
(2) U= i , Vz-2|1-i |, 
1-i 1-i 


U*V =(1-2i)(-2+ i) + (D1 -i) + (14+ D0-D) =14+ 4i, 
V*U =(-2-i)(14+ 21) + (14+ )@) + 0*0)0-i) =1- 4i. 


From the above examples it is evident that 
(U*V)* =V*(U*)* = V*U. (4.3.14) 


If U is changed to c,U and V is changed to c,V, where c, and c, are scalars, then the 
dot product of (c,U) with c,V is given by 


(c,U)* (c; V) = cc9U* V. 


That is, c, is changed to its complex conjugate whereas c; remains as it is. 


4.3.6 Matrices with complex elements 


Let Z bean m x n matrix where the elements are in the complex field. Then Z will be 
of the following form: 


Xy t!Yn > — Xm tlYm 
Z= : : 
Xm + lV Nux Xmn + ! Ymn 


=X+iY, X=) Y=) 
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where x;'s and y;j/'s are real. If any of the y; is zero then the corresponding element 
is a real quantity. If any of the xj is zero then the corresponding element is a purely 


imaginary quantity. 


Definition 4.3.6 (Complex conjugate of a matrix). Let Z = X + iY be an m x n matrix 
in the complex field. Then Z* = X — iY is called its complex conjugate. That is, the 
complex conjugate of every element in Z is taken or if Z = (z;) then Zs Gi ; 


Definition 4.3.7 (The conjugate transpose of a matrix). The transpose of the conju- 
gate matrix is called the conjugate transpose of the matrix. If Z = X + iY then its conju- 
gate transpose, denoted by Z*, is given by Z* = X' - iY' where the primes denote the 
transposes. That is, 


Z* « [X +iY)°]' 2 [X - iY]' =X! - iY' 2 [X «iY)']*. 


For example, 


a matrix in the complex field its conjugate 
1 0 24i 
1 0 ed i 
1+i 3 1-i|, ; MD 
2 3+i 4-i MEE 
2 3-i 4+i 
(2411-1) (2-411141) 
its conjugate transpose its conjugate transpose 
1 1-i 2 2-i 
0 3 3-i 1 
2+i 14i 4+i 1+i 


Let us see what happens if a square matrix in the complex field is equal to its conjugate 
transpose. Let 


Z=X+iY 2 Z* =X'-iY'. 


If Z = Z* then X = X' and Y = -Y'. That is, the real part of the matrix has to be sym- 
metric and the imaginary part of the matrix has to be skew symmetric. Such matrices 
are called Hermitian matrices. 


Definition 4.3.8 (A Hermitian matrix). If Z is equal to its conjugate transpose Z* then 
Z is called a Hermitian matrix. 


Definition 4.3.9 (A skew Hermitian matrix). If Z is equal to (-1) times its conjugate 
transpose, that is Z = —Z*, then Z is called skew Hermitian. 
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Thus if Z is skew Hermitian then the real part is skew symmetric and the imaginary 
part is symmetric. 


(iv) The diagonal elements of a Hermitian matrix are real. The diagonal elements 
of a skew Hermitian matrix are purely imaginary or zero. The elements above the 
leading diagonal of a Hermitian matrix are the complex conjugates of the corre- 
sponding elements below the leading diagonal. The elements above the leading 
diagonal of a skew Hermitian matrix are minus one times the complex conjugates 
of the corresponding elements below the leading diagonal. 


Let 
Z-X-«iY, X= (xj); Y= (yj) 
Z-Z' 2 Xj =X Yj =-Yjp Vn 7 0, foralliandj; 
Z-2-Z' 2 xy7-Xj Yg7Yj» Xi 70, foralliandj. 
For example, 
2 1+i 2-i 
(a) Z-|1-i 5 3«2i|, ZisHermitian, Z* =Z; 
2+i 3-2i 6 
(b) Z= 1 . de , ZisHermitian, Z* =Z; 
5-4i 7 
3i 2 — i . eye * 
(c) Z- . _|, Zisskew Hermitian, Z* =-Z; 
-2-i -bi 
0 14i 2+i 
(d) Z-|-1«i 2i 1-i|, ZisskewHermitian, Z*--Z. 
-2+i -1-i -5i 


Some properties are immediate for conjugate transposes, denoted by «. 
(A+B)* =A* +B"; 
(AB)* = B* A*; 
(CA + c9B)* =cjA* +cSB*, c}, c; scalars. 
For n x 1 vectors U and V 


U*U = |U]? = lu? Tec lu, (square of the length); 
U*V#V*U, (U*V)' =V*U; 

U*V=0 = Uis orthogonal to V; 

U*U =1 = is anormal vector; 

U*V-V*UifU"V is real. 
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Definition 4.3.10 (Orthogonality). Let U and V be two vectors in the complex 
space C". Then U is said to be orthogonal to V if U*V = 0. 


Note that (U*V)* = V*U = 0* = 0. Hence the condition U*V = 0 also implies 
V*U - 0. That is, if U is orthogonal to V then V is orthogonal to U or they are orthog- 
onal to each other. 


Definition 4.3.11 (Orthonormal system). Let U}, ..., U; be k vectors in C". They form 
an orthonormal system if Uř U; =0 for all i and j, i +j and [Uj| = 1 for i = 1,2, ..., k. 


Definition 4.3.12 (A unitary matrix). Consider an n x n matrix Q whose columns are 
orthonormal. Then Q is called a unitary matrix, Q* Q = I or 


Q Q=I = Q*=Q* => I=QQ*. 


When the columns are orthonormal the rows are also orthonormal for an n x n 
unitary matrix. For example, 


lti li 
_| 2 2 
(a) a-| Li ddp 
ue 193 
lio hi 
e- [3 un QQ* =b, Q*Q-l 
2 2 
bi i dá 
ue x 
=i -1+i 
Wo SPE rg. SEL 
db uA LE 
v6 v6 v6 
li qi 1 
Hz E 
* 1i _ 4 * 0 * = 
Q* = o p Q(X-L Q'Q-L. 
Li i 1 
v6 2 6 


Definition 4.3.13 (Semiunitary matrices). Let Q be an n x n unitary matrix such that 
Q*Q =I, QQ* =I. Consider m columns of Q, m < n, for example the first m columns 
U,,..., Ug. Let S be the matrix formed by these m vectors, S = (Uj, ...,U,,). Then S* S = 
I4 and S is called a semiunitary matrix or an element in the Stiefel manifold. 


Note that SS* + I, and SS* is an n x n matrix whereas S*S is an m x m matrix. We 
V, 
could have also taken m row vectors V}, ..., Vm of Q and create the matrix S, = ( i Jj: 


Then 5,51 = Im whereas S; S, + I,. Thus S, is also a semiunitary matrix. Semiunitary 
matrix is a matrix formed with a subset of a full orthonormal system of vectors. In our 
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illustrative examples above, 


Ve v6 
(1) S= E , S'Szb, Sissemiunitary; 
1 2 
ve v6 
lio d 
v6 v6 
(2) S= H dH , S*S-L, Sissemiunitary; 
1 1 
ve v6 
l+i lri i 
(3) S,= E i |. S,Sj =, Sis semiunitary; 
2" 7A. 
li o hh 
(4) Si- | 1 23 : | SSi =h, 5S, is semiunitary. 
ve ve v6 


Exercises 4.3 


4.3.1. Mark the following complex numbers as points in an (x, y)-coordinate system. 
214i,2-4i, -2 + 4i, -2- 4i, 1 + i, i, 2, - i. 


4.3.2. Compute the following: (a) (z + Z2) + 23, (b) 2; + (z; + 23) where z, = 1+ 3i, 
2,-2-i Z3 =2+ 4i. 


4.3.3. For Z,,Z,Z3 in Exercise 4.3.2 compute the following: 


() zz QU 2% G) zfz, — (5 zi (5) 242925, 
1 
(6) Z12525, (7) [z)z523]", (8) CN (9) (AB )s, 
all roots. 


4.3.4. Compute the following for z; = 2i, z) -1— i, Z3 = 4: 


(1) (zzyz3)5, all 5 roots, (2) (z£z3z3)3, all roots, 
(3) (y, (4) (Z2), all roots. 


4.3.5. For the following vectors compute (1) the conjugate, (2) the conjugate transpose 


(a) [1+ 2i,3i,4], (b) 243i |, 


(c) [L-1210,4- 2i]. 


4.3.6. Compute the lengths of the vectors in Exercise 4.3.5. 


4.3.7. Mark the 5-th roots (all roots) of 1, 11, i, 2i on the same graph. 
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4.3.8. Take the 6-th roots of 1. Let the roots be denoted by 1, w1, w;, ... , W5 where w; = 
e/'5. Then show the following: 

() w;2wl. j-2,3.4,5, 

(2) 1+w,+-++w, =0, 

(3) plot all the 6 roots in the same graph. 


4.3.9. With w, = e27/" show that the properties in Exercise 4.3.8 (1) and (2) hold for 


any n and plot the points 1, w,,...,w,_; and examine their relations to a circle of unit 
radius in the complex plane. 


4.3.10. Show that the Fourier matrix 


qo d 
1 ;:2 3 
F= i E i P has the inverse 
Yo ÉP 
1 1 1 1 
pi.1i|i| © (-D^ i? 
4|1 Ci? C)* Cf 
1 D? = dn (D? 


1 w 1 w 
1 1 1 1 1 1 
F;=|1 w, wi| thenFj'=-]1 wj! wg 
3 3 W3 = 3 3 P 
1 wi wj 1 w;? w;4 
1 1 1 1 
1 w, w2 wi 
[EE - wi w2n-D 
1 wil wAn-D wir? 
then 
1 1 1 1 
wil Wat. ns. WA? 
Fis =|). wg Wee du WED 
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4.3.12. Check whether the following system of vectors are linearly independent: 
(a) U,=(1-i1,1+i,2), U,=(1+i,1-i,5), 
U3 = (2 + i,3,1 — i) 
(b) U =(1-i,1+i,2) U,=(2+3i,2+i,1-i), 
U3 - (4+i,4+2i,5 - i). 


4.3.13. Construct four 4 x 1 vectors in C^ such that they form an orthonormal (unitary) 
system. 


4.3.14. Construct 3 orthonormal 4 x 1 vectors U}, U}, U3 in C^ such that 
(U,, U», U3)* (Uy, Uy, U3) = I. 
4.3.15. Construct 3 examples each of (1) 2 x 2, (3) 3x3, (3) 4x 4 Hermitian matrices. 


4.3.16. Construct 3 examples each of (1) 2 x 2, (2) 3 x 3, (3) 4x 4 skew Hermitian ma- 
trices. 


4.3.17. Let A be an nx n matrix whose elements are in the complex field. Show that 
one can always construct a Hermitian matrix B as a function of A. 


4.3.18. Let A bean n x n matrix whose elements are in the complex field. Show that 
one can always construct a skew Hermitian matrix C as a function of A. 


4.3.19. If A is an nx n matrix and A* its conjugate transpose then show that (1) B = 
(A + A*)/2is Hermitian and (2) C = (A - A*)/2 is skew Hermitian. 


4.3.20. Compute the eigenvalues and the corresponding eigenvectors of the matrix 
re ese 
-1 2+i 
4.3.21. Compute the eigenvalues of the matrices in one example each in (1) and (2) of 
Fxercise 4.3.15. Is there any interesting property that you see for these eigenvalues? 


4.3.22. Compute the eigenvalues of the matrices in one example each in (1) and (2) of 
Exercise 4.3.16. Is there any interesting property for these eigenvalues? What can you 
say about the singularity of the matrices in your examples? 


4.3.23. If Ais Hermitian show that A* is Hermitian for k a positive integer. 
4.3.24. If A? is Hermitian is A Hermitian? Justify your answers. 


4.3.25. Find the rank of the following matrix: 


243i 2 5-i 
A=|2-i 1+i 3 
2i -4 1-i 


and if itis nonsingular then evaluate the regular inverse of A. 
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4.3.26. Find nonsingular matrices R and S such that B = RAS where 


Ale ole Hes ee IL 
3-i i 34+i 1-i 


4.3.27. Solve the following system of linear equations: 
(2+ 0x, -x;-(5-ix4-21-i 
3iX, + (1+ I)x,-X3=2+4+i 

X-X + (0 x4 =7. 


4.3.28. Let A and B benx n matrices. Show that AB and BA have the same character- 
istic polynomials and the same eigenvalues. 


4.3.29. For square matrices A and B if AB - BA then show that A and B have at least 
one common eigenvector. 


4.3.30. If Ay is an eigenvalue of an n x n matrix A and if P(A) is any polynomial in A 
then show that P(A) is an eigenvalue of P(A). If 


P(A) - bg bjÀ & 4 b,A* then P(A) = bol + b4A +- +b, AX. 


4.4 More properties of matrices in the complex field 


Since we have introduced vectors and matrices whose elements are in the complex 
field we are in a better position to derive more properties of matrices. One of the prop- 
erties that we would like to investigate is concerned with the eigenvalues of symmetric 
and Hermitian matrices. What about the eigenvalues of a real symmetric matrix? From 
our numerical examples in Sections 4.1 and 4.2 we have seen that even if the elements 
of a matrix are real their eigenvalues as well as eigenvectors could be in the complex 
space. 


4.4.1 Eigenvalues of symmetric and Hermitian matrices 


If we confine our discussion to symmetric matrices with real elements (real symmetric 
matrices) do we have any interesting results? Let A = A’ and real. Let A, be an eigen- 
value of A and X, a corresponding eigenvector. Then 


AX, =A,X, > XpA* - EXT (a) 


where a « indicates the conjugate transpose and c indicates the complex conjugate. 
But since A is real and symmetric A* = A itself. Let us postmultiply (a) by X,. Then 


X; AX, - EXE X4. (b) 
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But X; X, = |X;|? =areal nonzero quantity. Now premultiply AX, = A,X, by Xj to obtain 
X*AX, - AXE X}. (c) 
From (b) and (c) 
(Ay -ADX:X, 20 = A, =Ay 


since X; X, = ||X,|? + 0. Then A, = Af means that A, is real. Thus we have the following 
important property. 


(i) The eigenvalues of a real symmetric matrix are all real. 


Since Hermitian matrices in the complex field and the real symmetric matrices have 
many parallel properties let us look at the eigenvalues of a Hermitian matrix A. If A is 
Hermitian then A = A*. Let 


A=A,+iA, thenA=A* > A,=Aj, A)--A, 


Let A, be an eigenvalue of A and X, a corresponding eigenvector, Aj the complex con- 
jugate of A, and X7 the conjugate transpose of X,. Then 


AX, =A,X, > X*A* =ACX*, 
=> XfA=A;X;} since A= A* 
X1 AX, = EXE Xs. (d) 
From AX, = A,X, we have 
X AX ATA. (e) 


From (d) and (e), A, = Aj or À; is real. Thus we have another important result. Note that 
the step in (d) above holds if the matrix A is real symmetric or Hermitian symmetric. 
If A is symmetric but with some of the elements complex then (d) and (e) need not 
hold. Then when you take the conjugate transpose it need not be equal to the original 
matrix. Hence symmetry is not sufficient. Either A should be real and symmetric or 
Hermitian. Then only we can guarantee that the eigenvalues are real. 


(ii) The eigenvalues of a Hermitian matrix are all real. 


Example 4.4.1. Compute the eigenvalues and the eigenvectors of 


2 1+i 
a=] , i 
1-1 3 
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Solution 4.4.1. Note that A is Hermitian since A = A*, and hence we can expect real 


2-A 20 


eigenvalues. Consider 
1+i 


A-Al|=0 
ac t 3-A 


= (2-A)3-A)-(1+)0-)=0 
A, =4 


= M-5À«4-20 or A=1, 


z qt 


are the eigenvalues. For A, - 1, 
3-1 
i 


(A-AD)X-02 E i 
1 1-«i||[x 0 
> f = 
1-i 2 X) 
Are the two rows of A — AI linearly dependent? They must be, otherwise we made 


some computational errors somewhere. Multiply the first row by (1 — i), that is, 
[1-à(1-)04))] - [1-52] 


which is the second row. Taking the first equation 
f -1-i 
x «0«0o-0 > x-( g ) 


t sqb 


is one solution. Now, consider A, = 4. 
3-4 


(A-AJ)X-02 E i 
=> (1-ipx4-x,-0 


1 
2x.) 
3 
1 
3 3 


is a vector corresponding to A,. A matrix of eigenvectors in this case is given by 
-1+i 
1 mae Ts | 


| with its inverse Q^! = | 


-1-i 
o= | 1  1-i 
Hence in this case we can have a representation of the form A = QAQ', that is, 
-1+i 


| 2 Aera 1 |l le i 
= 1 1 ld | 
3 1 1-iflo 4J| 1. i 


1-i 
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Example 4.4.2. Compute the eigenvalues and eigenvectors of 


A- A UR 
-1«i 3i 


Solution 4.4.2. Note that our matrix A here is skew Hermitian. Consider 
2i-À 1+i 
-1«i 3i-A 
= M-5i-4-0 
Da 5i + y (5)? — (4)(-4) | 5i£3i 
i 2 DH 
SA,=i, A=4i 


|A-AI|=0 => 


are the eigenvalues. An eigenvector corresponding to A, =i is available from 


TENE | a HEN 


-1+i 3i-ij| |x» 0 


=> ix * (14 i)x, =0. 


) 2 14 i. Hence one vector is 
-1+i 
X= , 


(A -AJDX 20 = (-1+i)x, - ix; - 0. 


i (DG 
"s z 


Now, consider 


= (Ck) = Hn. One vector corresponding to A, = 4i is then 


l 
-Hi 2 
1-i 
X,2| 2 |. 
a 


A matrix of eigenvectors is therefore 


Ses eo 2[-à 3 
= 2 ith Pesce 1-i 2 : 
3 | 1 d wiag n 1 


For x% =1, X% = 


Note that A can be written as A = QAQ™ in this case also. (Verification is left to the 
student.) 


One observation can be made from this example. We get the eigenvalues of this 
skew Hermitian matrix as purely imaginary. Is this a general property? Let us investi- 
gate this further. Let A be a general skew Hermitian matrix. Then 


A'--A, A=A,+iA, > Al=-A,, AL - As. 
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Let X, be an eigenvector of A corresponding to an eigenvalue A,. Then 


AX, =A,X, > Xt AX, - AXE X4. (a) 
(AX,)* =X*A* - -XTA 2 -Xt AX, =ACX?X,. (b) 


From (a) and (b) we have, since X7 X, = ||X,||? #0, 
A, * Aj =0. 


This means the real part of A, is zero or A, = 0. Thus the eigenvalues of a skew Hermi- 
tian matrix have to be purely imaginary or zero. 


(iii) The eigenvalues of a skew Hermitian matrix are purely imaginary or zero. 
(iv) The eigenvalues of a real skew symmetric matrix are purely imaginary or zero. 


Since the complex roots appear in pairs we have an interesting result. There cannot be 
an odd number of purely imaginary roots. Then if the order nis odd and if the matrix is 
skew symmetric or skew Hermitian then it must be singular because at least one root 
must be zero. 


(v) If an n x n matrix A is real skew symmetric or skew Hermitian then A is singular 
if nis odd and the determinant of A, being product of eigenvalues, is the square of 
real number if n is even. I + A and J - A are nonsingular. 


Let us examine the eigenvalues of a unitary matrix. Let Q be a unitary matrix. Then 
QQ* =I, Q*Q-I. 
Let A, be an eigenvalue of Q and X, an eigenvector corresponding to A,. Then 


QX, =A,X, > Xf Q* 2 AjXg (a) 
2 X*Q*X, 2 XXIX. 


From (a), premultiplying the first part by Q*, we have 


X,-AQ'X, 9 XIX; =A, XP OX, - AACXT X, 
» AA =1> (A, =1. 


(vi) The eigenvalues of a unitary matrix are such that the absolute value of the roots 
are 1, that is, if À is a root then AAS = 1. 
(vii) Every real square matrix can be written as a sum of a symmetric and a skew 


symmetric matrix. [Let B — E and C = A B' =B, C! =-C.] 
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(viii) Every square matrix can be written as the sum of a Hermitian and a skew Her- 
mitian matrix. 
(ix) Every square matrix can be written as a sum of two nonsingular matrices. 


We had seen from Chapter 2 that every square matrix can be written as A = PDQ 
through elementary operations, where P and Q are nonsingular, D is diagonal and A 
is n x n. Let D = diag(d,,...,d,,0,...,0) where dj #0,j=1,...,r. Write, for example, 


D=D,+D,„ D, saig Son tursada), 
d 


. fd 
D, = diag( $, ..., 5, -d4,... 7d.) 


where dj +0,j=r+1,...,n. Then both PD,Q and PD,Q are nonsingular and the sum 
is A. 


(x) If P and Q are unitary n x n matrices then PQ as well as QP are unitary matrices. 
(xi) If the determinant of A, that is, |A| = a + ib then |A*| =a - ib which means that 
|A| is real if A = A* (Hermitian). 

(xii) Complex eigenvalues appear in pairs. If a+ib is an eigenvalue of a given matrix 
A then a - ib is also an eigenvalue of A. 


Example 4.4.3. Express A asa sum of a symmetric and a skew symmetric matrix and 
Bas the sum of a Hermitian and a skew Hermitian matrix, where 


12 -2 243i 1-i 2+i 
A=|3 1 5], B=]14+2i 3+4+i 142i 
2 4 -2 2-i 2+5i 7i 
Solution 4.4.3. Let 
1 12 -2 1 2 
A7 3|[A*4]77 3 1 5[|*«2|2 1 
2 4 -2 -2 -2 
1 3 0 
_|5 9| 4! 
E E x 
o 3 -2 
Let 
1 1 12 -2 1 1 2 
4-7 5|A-A']=5 3 1 5 219, 2 
2 4 -2 -2 5 -2 
0-12 
=|} o l[s-AL A-A +A). 
2 -5 0 
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Let 
1 * 
B, = =[B+B*] 
2 
2+3i 1-i 2+i 2-31 1-2i 2+i 
=> 142i 3+i 14+2i/+=]1+i 3-i 2-bi 
2-i 2+5i 7i 2-i 1-2i -7i 
2 1-ši 2+i 
» 3; 3 3;| _ ps 
= E SU 2-5! = Bj. 
2-1 5 * 5l 0 
Let 
3i li 
1 " 1; i Ys " 
B, = 5|B-B ]= 5l 1 -5t5l = -B3, B = B; + B, 
o i«j Ji 


Example 4.4.4. Write the following matrices as the sum of two nonsingular matrices: 


10 1 10 1 
A-|2 1 -1], B=]2 1 -1 
0 1 2 3 1 0 


Solution 4.4.4. By inspection |A| 7 0, |B| = 0. That is, A is nonsingular and B is sin- 
gular. Then A can always be written as 


A=A,+A,, A,=aA, A,=(1-@A, «#0, 


where both A, and A, are nonsingular. Hence we look at B. One way of doing it is 
to look for a representation B = QDQ™! where Q is a matrix of eigenvectors. For this 
procedure we need the eigenvectors and we cannot tell in advance whether Q will 
be nonsingular. Hence consider pre and post multiplications by elementary matrices. 
This process will always produce two nonsingular matrices. Let 


1 0 0 10 0 
F,=|-2 1 0], Ft=|2 1 Of, 
0 0 1 o 0 1 
1 0 1 1 0 0 
FB=|0 1 3|, F;4.|O 1 oj, 
3 1 0 -3 0 1 
1 0 0 1 0 1 
Fjí-|01 0, F,FB=|0 1 -3 
3 0 1 0 1 -3 
1 0 0 1 0 0 
F;=|0 1 Oj|, FjSi-20101 ol, 
o -1 1 O 1 1 
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10 1 
FEFB-|0 1 -3], 
00 0 
10 4 10 1 
F,=|0 1 0|, Ej!-|o 1 Of, 
00 1 00 1 
10 0 1 0 0 
FFF BF, = 0 1 -3 > F; = 0 1 3 > 
00 0 00 1 
10 0 
FSI 1-3]. 
0 O le 
100 
F3F,F,BF,F,=|0 1 0|-D 
0 0 0 
Then 
B=F;'F;'F3'DF;'F;,". 
Let 
1 0 ol[1 o o][1 0 
P=F'F;'F;'=|2 1 O|/0 1 o|[o 0 
o 0 ıj[3 0 1[|0 1 
10 0 
=|2 1 0f, 
3 1 1 
10 oļf[1 01 10 1 
Q=F;'F;'=|0 1 -3||0 1 O|-]O 1 -3 
00 1][O 0 1 0 0 1 


Here P and Q are nonsingular since they are products of the basic elementary matri- 
ces. But D can be written in many ways as the sum to two nonsingular matrices. For 
example, 


3 0 0 -2 O 20 
D=D,+D,„ D,=|0 2 0], D,=|0 -1 O 
0 0 4 Ü 0 24 


Then 


B-B,*B, B,=PD,Q, B,=PD,Q. 
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Example 4.4.5. Show that one eigenvalue of a singly stochastic matrix (Markov ma- 
trix) is 1 and illustrate it by computing the eigenvalues of 


02 05 03 
A=|05 0 05 
0.4 02 0.4 


Solution 4.4.5. Consider the equation |A — AI| = 0. Add all the columns of A - AI to 
the first column, the determinant remains the same. The first column becomes 1 - A 
repeated. Take out 1 - A. [If the columns have this property then add all rows to the 
first row. Then take out 1 — A from the first row.] Thus, in general, when the matrix is a 
Markov matrix one eigenvalue is 1. For the example above 


02-A 05 03 1-A 05 03 
lA-M|-| 05 -A 05 [=]1-A -A 05 
04 02 O4-ÀA| |1-A 02 O4-A 


1 0.5 0.3 
=(1-A)]1 -À 0.5 
1 02 O4-A 


Add -0.5 times the first column to the second column and -0.3 times the first column 
to the third column. Then 


1 0 0 
|A-AI|=(1-A)]1 -A-0.5 0.2 
1 -0.3 0.1-A 


-A-05 02 | 


-0.3 O1-A 
= (1- A)[C-A - 0.5)(0.1 - A) - (-0.3)(0.2)] 
= (1- A? + 0.4A + 0.01). 


NT 


The roots are A, = 1, A, = -0.2+ v0.03, À; = -0.2- v0.03. Note that |A| < 1, |A3| < 1. 
(xiii) One eigenvalue of a singly stochastic matrix (Markov matrix) is 1. 


Note that the sum of the eigenvalues equals the trace. But the diagonal elements of 
a Markov matrix are non-negative and less than or equal to 1. The maximum value 
attainable for the trace of an n x n matrix is n, out of which one eigenvalue is 1. Hence 
the maximum value for the sum of the remaining eigenvalues is only n - 1. We can 
show by using the property that powers of Markov matrices are also Markov matrices 
that the remaining roots satisfy the condition |Aj| < 1 for all j. 
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4.4.2 Definiteness of matrices 


So far we have given three definitions for the definiteness of a real symmetric matrix, 
one definition in terms of quadratic forms, another in terms of determinants of the 
leading submatrices and a third in terms ofthe eigenvalues. Are these three definitions 
equivalent? For example an nxn matrix A with real elements and symmetric is positive 
definite if 

(1) the quadratic form X' AX remains positive for all possible non-null X (definition 1); 
(2) theleading or principal minors of A are all positive (definition 2); 

(3) the eigenvalues are all positive (definition 3). 


Are definitions (1) and (3) equivalent? When A is real symmetric it is trivial to show that 
there exists a full set of orthonormal eigenvectors (part of it is established in property 
(x) of Section 4.2 and for the remaining part see Exercise 4.4.6 (7)) so that 


A-PDP', PP'=I, P'P-I 


and D is a diagonal matrix with the diagonal elements being the eigenvalues of A. 
Then 
X'AX = X'PDP'X = Y'DY 
=Ayit-- +A, Y=P'X, 
D=diag(A,,...,A,). 


For arbitrary real X which means for arbitrary real Y if X' AX > 0 then 
Ai t+ +Any2 > 0 


for all Y’ = (y,,..., y,). Puty; =0,... jai = 0, Yj+1 = 0... yn 20. Then Ay > 0. Since y; 
is real, this means that A; > 0, j = 1,...,n. By retracing the steps the converse is true. 
Note that the matrix appearing in the quadratic form can be taken as symmetric with- 
out any loss of generality. 

Now let us look at the definitions (1) and (2). Assume that X' AX > 0 for all possible 


non-null X, A = (a5) = A’. Let X! = (xy, ...,x,). Putx; = 0 = -+ = x,. Then a,x? > 0. Then 
ay > 0 since x; > 0 for real nonzero x,. Now put x; = 0 =--- = x,. Then 
au Ap} |x 
pasal | 11 al ‘| >0 
an 422] 1X2 


for all x, and x;. This means 
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is positive definite. Then by definition (3) the eigenvalues of A, must be positive and 
hence its determinant is positive, |A,| > 0. Continuing like this the determinants of all 
the leading sub-matrices are positive. 

When looking at negative definiteness note that when all the eigenvalues of a ma- 
trix are negative product of an odd number of them will be negative and the product 
of an even number of them will be positive. We will be looking at the eigenvalues of 
the leading sub-matrices A4, A>, ..., A, when following through the above arguments 
to show the equivalence of all the definitions for negative definiteness. 


(xiv) If an n x n matrix A = A! = (aj) is real and positive definite then a; > 0,i = 
Pm; 

(xv) When the n x n matrix A is real symmetric then there exist a full set of n or- 
thonormal eigenvectors or there exists a matrix P such that P'AP = A, PP' - I, 
P'P = I where A is diagonal with the diagonal elements being the eigenvalues of A. 
(xvi) When the n x n matrix A is Hermitian there exists a full set of n orthonormal 
eigenvectors or there exists a matrix Q such that Q* AQ = A, QQ* - I, Q*Q - I. 
(xvii) Eigenvectors corresponding to real eigenvalues of a real matrix are real. 


Observe that when the elements in a matrix are real one can have the eigenvalues 
real, rational, irrational or complex. Then the eigenvectors corresponding to irrational 
eigenvalues will be irrational and eigenvectors corresponding to complex eigenvalues 
will be complex. When a matrix has complex elements then also the eigenvalues can 
be real, rational, irrational or complex. 


Definition 4.4.1 (Hermitian definiteness). A Hermitian matrix is said to be (1) posi- 
tive definite, (2) positive semi-definite, (3) negative definite, (4) negative semi-definite, 
(5) indefinite if the eigenvalues are (1) all positive, (2) non-negative, (3) negative, (4) 
negative or zero, (5) some are positive and some are negative and at least one in each 
set. [Remember that the eigenvalues of a Hermitian matrix are real.] 


Example 4.4.6. Reduce the following real quadratic form u = x? - 2x,x, + 2x3 to its 
canonical form (linear combination of squares) through the eigenvalues as well as 
through a different method. 


Solution 4.4.6. Writing the quadratic form with a symmetric matrix, we have 
1 -1 
ES 


-1 2|lx 
acr el 
-1 2 
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Let us evaluate the eigenvalues and eigenvectors of A. Consider the equation 


|A-AI|=0 2 (1-A)(2-A)-1=0 
=> M-3A41-0 


3 V5 3 V5 
i UU 9 


are the eigenvalues. An eigenvector corresponding to A, is given by 


3, v5 
snos EED La] 


-1 EE 


1 
ix [sas 


with ||X,|| = a, say, is an eigenvector. Then the normalized X, is Y, = Da An eigenvector 
corresponding to A, is given by 


1 
(4-ADX-0 = X,-| | 
d ^ i-ia- V5) 


is an eigenvector. The normalized vector is Y, = aX where f = |X;||. Let 


Q= (Y, Y2). 


Since Y} Y, = 0, Y/Y, = 1, Y} Y, = 1 the matrix Q is orthonormal and its inverse is its 
transpose. Also 


A=QAQ' = Q'AQ- ^ =diag(A,,A,). 
Writing 


u-X'AX -X'QAQ'X-Z'AZ, Z-Q'X, Z!'-(z,25). 


= 2 2 
u = AZ A 


8 C y " C 5k 


This is one representation through the eigenvalues. Let us consider another procedure. 
Since A is symmetric try to reduce A to a diagonal form by elementary operations: 


EAE EE Pe 
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Let 


Then 


1 0 
u= X'AX = Y'DY, »-(. j^ 


"ow - i 
u-cyity» yi-7X-X» Y2=X2 


Note that the two representations are not the same. 


4.4.3 Commutative matrices 


One property, repeatedly used in many branches of statistics, econometrics, engineer- 
ing and other areas is the simultaneous reduction of two matrices to diagonal forms. 
Let A and B be two n x n matrices. If there exists a Q such that 


QAQ!-D, and QBQ'=D,, 


where D, and D, are diagonal, that is, the same matrix Q diagonalizes both A and B, 
then what should be the conditions on A and B? We will investigate this aspect a little 
bit further. If there exists a Q then 


(QAQ')(QBQ') = D,D, = D,D; 
=> QABQ'=D,D, 
= AB =Q"'D,D,Q = (Q"D,Q)(Q"D,Q) = BA. 
That means A and B commute. Now, suppose A and B commute, that is, AB = BA. Does 
there exist a Q such that QAQ ! = D, and QBQ ! = D,? For simplicity let us assume that 
the eigenvalues of A are distinct so that there exists a full set of linearly independent 


eigenvectors for A which means there exists a Q with its regular inverse Q"!. [The result 
can also be proved without this restriction, the steps will be longer.] Then 


QAQ”! = D.. 
Let A, be an eigenvalue of A with X, a corresponding eigenvector. Then 


AX, =A,X, > BAX, =A,BX,. 
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But BA - AB which means 
ABX; -A,BX, = A(BX4) = A4 (BX;). 


This shows that X, and BX, are eigenvectors of A for the same eigenvalue A,. By as- 
sumption the eigenvalues are distinct. Then the null space of (A - A,I)X = O has rank 1 
which means X, and BX, are scalar multiples of each other or BX, = u, X;. This shows 
that A and B have the same eigenvectors. Thus the same Q will diagonalize B also. 
Hence we have a very important result. 


(xviii) There exists a Q such that QAQ is diagonal and QBQ ! is diagonal simulta- 
neously if and only if A and B commute, that is, iff AB = BA. If A and B are symmetric 
then Q^! = Q’ the transpose of Q. If A and B are Hermitian then Q^! = Q* the conju- 
gate transpose of Q. 

(xix) If the eigenvalues of a matrix A are all distinct then the null space of (A zAT )X = 
O has rank 1 where Àj is any eigenvalue of A. 


Chisquaredness and independence of quadratic forms in real Gaussian random vari- 
ables are the two fundamental results in statistical inference problems connected with 
regression analysis, analysis of variance, analysis of covariance, model building and 
many related areas. Without the statistical terminology we will illustrate the results 
here. 


Example 4.4.7. Let A =A’ be a real n x n matrix. Consider the real quadratic form 
X' AX. Then show that X' AX can be written as 


X'AX =y? +- +y?, ran 


that is as a sum of squares, if and only if A is idempotent of rank r. [This corresponds 
to the result on chisquaredness in statistics.] 


Solution 4.4.7. When A =A’ there exists an orthonormal matrix P such that 
P'AP =D =diag(A,,...,Ay) 
where À}, ...,À, are the eigenvalues of A. Let P'X =Y, Y' = (y4, ...,y,). Then 
X' AX = X'PDP' X = Y' DY = Ay? +++ + Any. (a) 


If A = A? (idempotent) of rank r then r of the Aj's are 1 each and the remaining ones 
are zeros. Then 


X'AX =yi + * y2. 
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Now, assume that X' AX = y? +--- +y? holds. Then from (a) it follows that r of the eigen- 
values are 1 each and the remaining are zeros. Since A is symmetric eigenvalues being 
1’s and zeros imply that A is idempotent. [If A is not symmetric then eigenvalues being 
T's and zeros need not imply that A is idempotent.] 


Example 4.4.8. Let A and B be real symmetric n x n matrices. Let X' AX and X' BX be 
two real quadratic forms. Then show that 

X'AX =Ayyi ++ +A,y? and 

X'BX-Ayhi te tae or<nksn, 


(where the y;’s appearing in X "AX do not appear in X’ BX) iff AB = O. [This result cor- 
responds to the result on statistical independence of quadratic forms.] 


Solution 4.4.8. Assume that AB = O. Then (AB)' = B'4' = BA = O. That is, AB = BA. 
Since both matrices are real symmetric and since they commute there exists an or- 
thonormal matrix P such that 


P'AP-D, P'BP=D, 
where D, and D, are diagonal. But 
AB=0 = P'ABP- 0 = P'APP'BP - 0 2 D,D,=0. 


That is, if there is a nonzero diagonal element in D, the corresponding diagonal ele- 
ment in D, is zero and vice versa. Let Y = P'X. Then 


X'AX =Y'D,Y 


a linear function of y?'s for r of the y;'s if r is the rank of A. Writing them as yr, ... ,y; 
we have 


X' AX 2 Ay? + + Ay? (a) 


If s is the rank of B then s of the y;’s, which are not in the X ' AX representation, will be 
present in X' BX. That is, X' BX will be of the form 


X'BX = A oa ape Se in (b) 


For convenience, all the nonzero diagonal elements are denoted as A,,...,A,,,. This 
establishes one part. Now if we assume that the physical separation of the variables 
as in (a) and (b) above then by retracing the steps we can show that AB must be null 
when A = A' and B = B' and real. [The statistical property of independence need not 
imply physical separation of the variables as in (a) and (b) above. Hence the converse, 
that is, to show that AB - O from the property of independence involves a few more 
steps from the implications of statistical independence. Therefore we will not discuss 
the proof here. For a proof see [7].] 
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(xx) Let A,,..., Aj be n x n matrices so that A;A; = A;A; for all i and j. Then there 
exists the same matrix Q such that Q"A;Q = Dj, j - 1,..., k where D; is diagonal. 


Definition 4.4.2 (A Hermitian form). Let X be an n x 1 vector and A an n x n matrix 
whose elements are in the complex field. Further, let A = A* (Hermitian). Then u = 
X* AX is called a Hermitian form, analogous to a quadratic form in the real case. 


A Hermitian form is always real since u is 1 x 1 and further 
u* =(X*AX)* = X A(X*)' 2 X AX =u. 
Ifu-a-ibthenu* -a- ib. If u = u* then b =0 or uis real. 


Example 4.4.9. Reduce the following Hermitian form to its canonical form. 


u = 2xxi  (1- i)x$x4 + WXGA + (1 — i)x§xX2 


+ (1+ xpxs + (1 + )x$x3 +X$X3. 
Solution 4.4.9. Writing in the standard form X* AX we have 
2 0 1+ilļfx 
X*AX = [x x$,.x5] | 0 2. 1«i||x 
far 1-i wol; 


Let us find the eigenvalues of A. Consider 


2-À 0 1+i 
la-A|202[|0 2-A 1+i|= 


Let us compute the eigenvectors. 


0 
(A-MDX-02|0 0 1si|lx 210 
Moe 5 


2X -|- 
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is one eigenvector. Consider 


daa 0 1+i Xx 0 
(A-ADX=O0>] o  1-X 14i ||x|-|0 
; ; 1 vi7 
1-i 1-i -5-7-35 ] L% 0 
+) (1+-V17) 
(15.4) 5519 


> X= (+i) & YID 
1 


is one such vector. From (A - A4I)X = O an X; is given by 


+) (17 VI7) 

(x DM 

X; = a+) 
1 


Note that Xr X, = 0, X' X4 = 0, X5 X4 = 0. Let a = |Xil, 6 = |X;l and y = ||X;|| be the 
lengths. Let the normalized vectors be 


1 1 1 
Y,=—-X,, Y,--.X, Y,--X, and - (Y, Y», Y). 
r= Gey I7 g/2 37 y Q= (Y, Yo; Y3) 
Then 
A-QAQ*, Q*Q-I, QQ*-I. 
Let 
1 
a (5 71,0) x 
¥=Q°X =| 5(1-) S42, a-d] | > 
*(Q - 2923, 1 -i) 839,1) | [x5 
1 
= —(X, —X5), 
yı 2d 2) 
1 (1+ VI7 
y= gja- 927 es +x) ex] 
1 J(0.- VI7 
y= [t-59 27 e exo x] 
Then 


E 1 1 
X* AX =2ly, |? + SU vVI7)ly;P? + jue vV17)ly;. 


Verifications of the various steps are left to the student. 
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Definition 4.4.3 (The canonical form). The canonical form of a Hermitian form is 
given by 


X* AX = Aly? Ae * Aly. 


where ly;| denotes the absolute value of y;, j =1,...,n, Y' =(y;,...,Y,) andA,,...,A,, are 
scalars. 


Example 4.4.10. Reduce the Hermitian form in Example 4.4.9 to its canonical form 
by a sweep-out process (by elementary operations). 


Solution 4.4.10. Consider the following elementary matrices. F}, Fr, F», F3 where 


1 0 0 1 00 

F= 0 10|5F!-| O 1 0|; 
-ia-i) 0 1 la-i O 1 
1 0 0 1 0 0 

F,=|0 1 o| => Fy!=|0 1 0 
o -la-) 1 o la-) 1 


Since A in this case is Hermitian we operate on the left of A by F; and on the right by 
Ft, the conjugate transpose of F,. (In the real symmetric case we operate on the right 
with the transpose of the elementary matrix. In the complex case we operate on the 
right with the conjugate transpose.) Thus 
1 o 0][2 0 14i][1 0 ja«) 
FAF} = 0 1 wg. spero 0 
1-i dar 1 Jlo o 1 


jà 


Now let B = F,F,AF; Fy then 


1 0 ol[2 o o]f1 0 
B=|0 1 O}}O 2 i1«i[||O 1 -jla«) 
o -ia-) 1}[0 1-i o [[O o 1 
20 0 
-|O 2 O|-D, say. 
0 0 -1 
Then 
A-Fj!F;j!D(Fj! F5!) = QDQ* 
1 0 o][2 o o][1 0 ia«b 
=| 0 1 0O[|o 2 0 |O 1 5a+i) 
ja-i) 3(1-i) 1]|O o -1}][0 O0 1 
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By straight multiplication of the matrices on the right we can verify the result. Consider 
the transformation Y = Q*X. That is, 


y 1.0 4a+i)] fx 
Y-|y|-|o 1 ja«n||o|^ 
¥3 0 0 1 X3 


1 f 1 ` 
yy =X + 50 tX, Y2=X%+ 2 +1)X3, Y3 =X. 
Then 
X* AX = Y* DY = 2ly,|? + ll? - lys". 
For example, 
2 C 1 : c 1 C. 
lul^-yxyf-paq- 50+ x; || XÍ + m i)x$ 
"m c 1 ; c 1 Codi C 1 Aa; c 
=X xq + AS + 1)X3Xq + 20 1)xX,X3 + zu + i) - i)xyx3 
3xde cd mora teur Sor c£ 
= |x,|¢+ j el + 50 tx 5 - 1)xX,X$. 


Similarly computing ly;l^, |y3|? and substituting in 2|y,? + 2ly;]? - ly3/? we can easily 


verify that we get back the Hermitian form given in Example 4.4.9. 


Note that if a real quadratic form in a real symmetric matrix A or if a Hermitian 
form in a Hermitian matrix A is to be reduced to a linear function of squares (canoni- 
cal form), with the coefficients of the squares not necessarily the eigenvalues of A, then 
the easier method would be to use elementary operations on the left and on the right 
of A (pre and post multiplications by elementary matrices) rather than going through 
the eigenvalues of A. In the Hermitian case if A is premultiplied by G,, a product of the 
basic elementary matrices, then A is to be postmultiplied by Gj, the conjugate trans- 
pose of G,. (In the real symmetric case we postmultiply by Gj only.) Such successive 
multiplications will reduce a Hermitian form into the following form: 


X*AX = X*QDQ*X = Y'DY, 
Y-Q*X, D=diag(d,,...,d,), 
X* AX = dj? +--+ + dnl nl? 


where |y;| is the absolute value of y;, Q is the product of all elementary matrices used 
on the left. By definition, Q will be nonsingular. But in this case QQ’ or QQ* need not 
be an identity matrix. In other words the linear transformation Y = Q* X need not be 
an orthogonal or unitary transformation. The procedure through eigenvalues will give 
an orthogonal transformation when A is real symmetric and a unitary transformation 
when A is Hermitian. 
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(xxi) If A = A* then X^ AX is real for all X. [Take the conjugate transpose. A 1 x 1 
matrix with its conjugate transpose equal to itself is real.] 
(xxii) If A= A’ and real and if A= PAP’, A = diag(A,, ...,À,) then 


A - AJB,PI + -+ + APP! 


where P,,...,P,, are the columns of P. 
(xxiii) If A = A* and if A = QAQ* then 


A=A QQ +: + Ann; 


where Q}, ..., Q, are the columns of Q. 
(xxiv) If A is Hermitian then iA is skew Hermitian. 


Definition 4.4.4 (Definiteness of Hermitian forms). A Hermitian form X*AX, A = A* 
is positive definite if for all possible non-null X, X*AX > 0, positive semi-definite if 
X* AX > 0, negative definite if X* AX < 0, negative semi-definite if X* AX < 0 and indef- 
inite if for some X, X* AX » O and for some other X it is negative. 


Note that when A = A* we have X* AX a real quantity. Also we know that when 
A = A* all the eigenvalues of A are real. It is easy to show that all definitions of defi- 
niteness of a Hermitian form are equivalent. The proofs are parallel to those in the real 
symmetric case. 


Example 4.4.11. Showthatthefollowing Hermitian form can be written as a quadratic 
form in real variables. 


24s 
h= [x,xt] 3 +1] |X, 
2-1 5 X 
= 3x1 X + 5X5X + (2 + xix, + (2 - x$x,. 

Solution 4.4.11. Let x, =u + iv, x; =x + iy where u,v, x,y real and i = V-1. Then 
xix, =(u-iv)(u+iv)=u? «v, x$x =x? «y^, 
xfx = (u - iv)(x + iy) = ux + vy + i(uy — vx), 

(2+ Dxtx4 + (2 - i)x$x, = (2+ i) [ux + vy + i(uy - vx)] 


+ 2- D[ux + vy + i(xv - yu)] 


= 4(ux + vy) + 2(-uy + vx). 


Hence the Hermitian form 


h =3(u? +v?) +5(x? +y?) + 4ux + Avy - 2uy + 2vx 
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3 0 2 -1]fu 
So o 31 xy 
2 1 5 0||x 
-1 2 0 sdb 


2 -1 
1122 


wal? SIT sof JE 
| 


+ 2[u, v] | 


This, in fact, is a general result. 


(xxv) A Hermitian form in n complex variables is equivalent to a quadratic form in 
2n real variables or equivalent to two quadratic forms in n real variables each plus 
two bilinear forms in n real variables each. 


This result is frequently used when extending the theory of real Gaussian multivari- 
ate statistical distribution to the corresponding multivariate Gaussian distribution in 
complex random variables. 

Another result which is frequently used in various applications involving a real 
quadratic form, or a Hermitian form in complex variables, is a representation of the 
matrix of the quadratic or Hermitian form in terms of a square root when the matrix 
is positive definite or positive semi-definite. Consider a real quadratic form u and a 
Hermitian form v where 

u-X'AX, A=A' and v=Z*BZ, B=B* 


where X is an n x 1 real vector and Z is an n x 1 vector in the complex space. We have 
already seen that there exist a full set of orthonormal vectors, and orthogonal and 
unitary matrices P and Q such that 


P'AP=A and Q*BQ=ny 


where A = diag(A,, ... , Àn), 4 = diag(j, ... , Mn) with the A,’s the eigenvalues of A and the 
Hj's the eigenvalues of B. If A and B are positive definite then A; > 0, uj > 0, j=1,...,n. 


We can write A; = Viv and yj = „jy. Define 
Ai = diag(yy,..., A) and "t = diag( n ..., Hg). 
Then 
A - PAP! = PA3 A3P! 2 PA3P' PA: P' 
=A?, A,=PA2P! 
and similarly 


B=B?, B,=Q"w2Q 
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where A, is called the positive definite square root of the positive definite matrix A 
and B, is called the Hermitian positive definite square root of the Hermitian positive 
definite matrix B. Note that if A and B are positive definite or positive semi-definite 
one can express both A and B as A = A,A! and B = B,Bj for some matrices A, and B, 
by following through the above procedure. What about the converses? If a real matrix 
A can be written as A = CC' where C may ben x m, n need not be equal to m, is A going 
to be at least positive semi-definite? Let us consider an arbitrary quadratic form 


X'AX-X'CC'X-Y'Y, Y=C'X 
zybt-e4y220, Y'z(yyys) 


for all real non-null X and A. This means that the matrix A is either positive definite 
or positive semi-definite. If C is of full rank (rank is mif m < n or n if n < m) then A is 
positive definite. The Hermitian case is parallel. If any matrix B can be written in the 
form B = GG* then the Hermitian form 


Z'AZ-Z*GG'Z-Y*Y, Y=G*Z 


= ly? + ys 20. 
That is, Bis at least positive semi-definite. 


(xxvi) Any positive definite or positive semi-definite (or Hermitian positive definite 
or Hermitian positive semi-definite) matrix A can be written as A = CC’ (or A = CC*) 
and conversely, any matrix A which can be written as A = CC' (or A = CC*), where 
C may be rectangular, is at least positive semi-definite. 


Definition 4.4.5 (Square roots). A symmetric positive definite (or Hermitian positive 
definite) square root of a symmetric positive definite (or Hermitian positive definite) 
matrix A is C = PA2P! (or C= PA3P*) where P is the matrix of normalized eigenvec- 
tors of A, A is the diagonal matrix of the eigenvalues of A and A} denotes the diagonal 
matrix with the diagonal elements being the positive square roots of the diagonal ele- 
ments in A. 


Exercises 4.4 


4.4.1. Compute the eigenvalues and eigenvectors of 
1 1«i -2i 1 1+i  —2i 
A-|1-i 2+i 2-i|, Bz-|-1-«i 3 1-i 
2i 2+i 3 -2i -1-i 4 
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4.4.2. Write the following matrices as the sum of symmetric and skew symmetric ma- 
trices: 


3 -2 1 
1 5 E Ws 4 1 2 

A- , B=]1 0 2|, C= 
3 7 Fai 1 -1 0 4 
5 0 -1 6 


4.4.3. Write the matrices in Exercise 4.4.2 as the sum of two nonsingular matrices. 


4.4.4. Write the following matrices as the sum of Hermitian and skew Hermitian ma- 
trices: 


2-i 1+2i 3i 

A-| 2i 3-i 2+5iļ, 
1«i Gi 2-i 
1«2i 2-i 342i 1+i 
1-i 3+2i 4i —bi 
24i 2-«5i 1-i 1+i 
345i 2-i 1+i 5i 


4.4.5. Compute the eigenvalues and eigenvectors of the following stochastic matri- 
Ces: 


0.2 04 02 02 


01 07 02 
03 01 04 02 
A-[03 05 02|, B= 
0.2 04 041 02 
0.3 02 05 


03 01 03 O4 


4.4.6. Are the following statements true, give a counter example if false and prove the 

results if true. 

(1) Ifthe eigenvalues are real then the matrix is symmetric or Hermitian. 

(2) If the eigenvalues are all purely imaginary then the matrix is skew symmetric or 
skew Hermitian. 

(3) Ifthe eigenvalues are 1’s and 0’s then the matrix is idempotent. 

(4) Ifthe eigenvalues are +1 then the matrix is orthonormal. 

(5) Ifthe eigenvalues are such that AA‘ = 1 then the matrix is unitary. 

(6) Since the eigenvalues of A are the eigenvalues of A’ then the eigenvalues of A and 
(A 4 A')/2 are the same. 

(7) If Ais n x n and real symmetric then there is a full set of n mutually orthogonal 
eigenvectors whether some eigenvalues are zero or repeated. 

(8) IfAisaneigenvalue and X a normalized eigenvector corresponding to A of a matrix 
A then A = X' AX if X is real, A = X* AX is X is in the complex space. 

(9) Ifthe eigenvalues of a matrix are real then the eigenvectors are also real. 
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4.4.7. Let 
1111 -1 -1 1 1 
A= 1 1 1 1 — B= -1 0 0 1 
1111 0 0 -1 
1 1 1 1 1 -1 -1 


If possible, reduce A and B simultaneously to diagonal forms. Compute that orthonor- 
mal matrix which will achieve this. 


4.4.8. Repeat Exercise 4.4.7 if A and B are n x n matrices where 


Te bec o 


4.4.9. Let A,,A, ben x n real symmetric idempotent matrices such that A, + A, = 
I4. Show that (1) both A, and A, can be simultaneously reduced to diagonal forms, 
(2) rank of A, plus rank of A, is n, (3) A, and A; commute. 


4.4.10. Let A, and A; be nxn real symmetric matrices so that A, +A, =I with A,A, = O. 
Show that A, and A; commute. 


4.4.11. Let A, and A, be n x n real symmetric matrices so that A, + A; =I and the rank 
of A, plus the rank of A, is n. Show that (1) 4,4; = O, (2) A; and A; are both idempotent. 


4.4.12. Generalize Exercises 4.4.9, 4.4.10 and 4.4.11 and establish the corresponding 
results if k such n x n matrices are involved, k > 2 such that A, + --- + A, =I. Orthogo- 
nality condition to be interpreted as A;A; = O for all i + j. 


4.4.13. Reduce the following Hermitian forms to their canonical forms by using ele- 
mentary operations: 


(a) 2xjx + 3x3x5 + 2x4x$ + (1 + xx + (14 xo + (2 - Dx$x3 
+ (1 - Dx$xi + (2+ Dx$x, + (1 - xx, 
(b) BX XY + 4X2X5 + 2x4x$ + IX[Xy + 2IXT XG — ixSx 


+ (1 = Dx$x3 - 2ix§x, + (1 + i)x$xs. 
4.4.14. Repeat Exercise 4.4.13 by using the procedure through eigenvalues. 


4.4.15. Check the definiteness of the Hermitian forms in Exercise 4.4.13. 


4.4.16. Heisenberg’s uncertainty principle in quantum mechanics. Consider the 
position matrix P, which is symmetric, and momentum matrix Q which is skew sym- 
metric. These P and Q satisfy the equation QP — PQ = I. By using Cauchy-Schwartz 
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inequality, or otherwise, show that 


1< 212X IPXT 
IXIL XI 


Hint: |X|? = X'X = X'IX. 


4.4.17. If A is a real skew symmetric or skew Hermitian matrix then show that the 
determinant of A is either zero or positive. 


4.4.18. Compute the symmetric positive definite square root of A and the Hermitian 
positive definite square root of B where 


a-l? 1 B- e Denn 
1 3 1-i 3 
4.4.19. For the Kronecker product defined in Exercise 2.6.8 where A = (a,;) and p x p, 
B = (bj) and q x q, and the Kronecker product denoted by A & B, show that 


p G;q XP 
(0 — |AeB|- (IT) (II) 
i=1 j=l 
where the A; and the v;'s are the eigenvalues of A and B respectively; 
(2) The eigenvalues of A & B are Av; for all i and j; 
(3) What are the eigenvectors of A @ B in terms of the eigenvectors of A; and vj? 


4.4.20. Let A be skew symmetric. Construct a matrix B in terms of I + A and I — A so 
that B is orthonormal. 


4.4.21. Show that the characteristic polynomial of the matrix 


A, A 
Alpe, 22 
[o s, 


is the product of the characteristic polynomials of A, and A3. 


4.4.22. Show that the eigenvalues of A and A’ coincide whereas the eigenvalues of A 
and A* are complex conjugates of each other. 


4.4.23. Similar matrices. Square matrices A and B are said to be similar (notation: 
A ~ B) if there exists a nonsingular matrix Q such that A = QBQ^!. If A ~ B then show 
that 

(i) A’~B' 

(ii) A* ~ BK 

(iii) A -AI ~ B -ÀI 

(iv) |A| = |B| 


4.4 More properties of matrices in the complex field —— 323 


(v) rank(A) = rank(B) 
(vi) P(A) ~ P(B) 


where P(A) is a polynomial in A. 


4.4.24. Show that the following two matrices are similar. That is, 


B, 0 7 B, O 
O B, O HB 
4.4.25. If A ~ B then show that tr(A) = tr(B). 


4.4.26. If A ~ B then show that (i) if A is idempotent then B is idempotent, (ii) if A is 
nilpotent of degree r then B is also nilpotent of degree r. 


4.4.27. Kernel of the matrix A. Notation: Ker(A). Consider the homogeneous system 
of linear equations AX - O. Then the set of all solutions (X] is the null space or the 
right null space of A. This right null space is also called the kernel of A. 

Image of the matrix A. Notation: Im(A). Let A be an m x n matrix. Consider the set 
of all m x 1 vectors Y such that Y = AX for some n x 1 vector X. This set is called the 
image or range of A. That is, Im(A) = (Y : Y = AX for some X]. If the matrix product AB 
is defined then show that 


Ker(AB) » Ker(B) 
with equality if A! exists, and 
Im(AB) c Im(A) 


with equality if B ! exists. 
The following are some problems on ranks posed by Dr R. B. Bapat of the Indian 
Statistical Institute, New Delhi, India. 


4.4.28. Let A,Bbenxn matrices. Show that 


€: A+B A 
A A 


| = rank(A) + rank(B). 


4.4.29. Let A be m x n and suppose A = [5]. Show that rankA > rank(B) + rank(D) 
and that the inequality may be strict. 


4.4.30. Let A ben xn nonsingular matrix of rank n- 1 and suppose that each row sum 
and each column sum of A is zero. Show that every submatrix of A of order (n - 1) x 
(n - 1) is nonsingular. 


4.4.31. Let A and B be nxn nonsingular matrices. Show that rank (A - B) = rank(A7! - 
B^). 
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4.4.32. Let A ben xn of rank 1. Show that |I + A| 2 14 tr(A). 


4.4.33. Let A bem x n, where the elements could be complex also. Show that rank 
(A) = rank(AA*) = rank(A* A). Is it true that rank (A) = rank(AA’)? 


4.4.34. Let A and B be nx n where B is positive semidefinite. Show that rank (AB) = 
1 
rank(AB?). 


4.4.35. Let A and B ben x n. Show that rank [4 1] = nif and only if B = A. 
4.4.36. Show that rank [3 f ] = n + rank(AB). 


4.4.37. Let A ben x n. If rank (A) = rank(A?) then show that rank (A) = rank(A”) for 
all m 2 1. 


4.4.38. Let A ben x n and let rank (A) - 1. Show that rank (4?) - 1 if and only if 
tr(A) #0. 


4.4.39. Let A be nxn. If rank (A) = rank(A**'), then show that rank (A*) = rank(A™) 
for all m= k. 


4.4.40. Let A ben xn. Show that A = A? if and only if rank (A) = rank(I - A) = n. 


5 Some applications of matrices and determinants 


5.0 Introduction 


A few applications in solving difference and differential equations, applications in 
evaluating Jacobians of matrix transformations, optimization problems, probability 
measures and Markov processes and some topics in statistics will be discussed in this 
chapter. 


5.1 Difference and differential equations 


In order to introduce the idea of how eigenvalues can be used to solve difference and 
differential equations a few illustrative examples will be done here. 


5.1.1 Fibonacci sequence and difference equations 


The famous Fibonacci sequence is the following: 
0, 1, 1, 2, 3, 5, 8, 13, 21, ... 


where the sum of two consecutive numbers is the next number. Surprisingly, this se- 
quence appears in very many places in nature. Consider a living micro organism such 
as a cell which is reproducing in the following fashion: To start with there is one 
mother. The mother cell needs only one unit of time to reproduce. Each mother pro- 
duces only one daughter cell. The daughter cell needs one unit of time to grow and 
then one unit of time to reproduce. Let us examine the population size at each stage: 


stagel number =1 one mother at the first unit of time 
stage 2 number =1 1 mother only 
stage3 number =2 1 mother +1 young daughter 


stage4 number -3  1mother, 1 mature and 1 young daughters 
stage5 number =5 2mothers, 1 mature and 2 young daughters 


and so on. The population size follows the sequence 1,1,2,3,5,8, ... the famous Fi- 
bonacci sequence. 

If you look at the capitulum of a sunflower the florets, or the seeds when the flo- 
rets become seeds, seem to be arranged along spirals starting from the periphery and 
going inward. You will see one set of such radial spirals going in one direction and an- 
other set of radial spirals going in the opposite direction. These numbers are always 
two successive numbers from the Fibonacci sequence. In a small sunflower it may be 
(3,5), in a slightly larger flower it may be (5, 8) and so on. Arrangement of florets on 
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a pineapple, thorns on certain cactus head, leaves on certain palm trees, petals in 
dhalias and in very many such divergent places one meets Fibonacci sequence. A the- 
ory of growth and forms, explanation for the emergence of Fibonacci sequence and 
a mathematically reconstructed sunflower head and many other details can be seen 
from the paper [5]. Incidently the journal, Mathematical Biosciences, has adapted the 
above mathematically reconstructed sunflower model as its cover design from 1976 
onward. 

If the Fibonacci number at the k-th stage is denoted by F, then the number at the 
(k + 2)-th stage is 


Fia2 = Fray + Fk (5.1.1) 


This is a difference equation of order 2. F,,, — Fx isa first order difference and Fj,,; Fi, 
is again a first order difference. Then going from F, to Fķ,2 is a second order difference. 
That is, (5.1.1) is a second order difference equation. One way of computing Fx for any 
k, k may be 10 385, is to go through properties of matrices. In order to write a matrix 
equation let us introduce dummy equations such as F, = Fy, Fi, = F,,4 and so on. 
Consider the equations 


Fray = Fray (5.1.2) 


and 


Then the two equations in (5.1.2) can be written as 


11 
Vi = AV; a=; af (5.1.3) 


Let us assume F, = 0 and F; = 1 which means V, = [1]. Then from (5.1.3) we have 
V,=AVo, V)=AV,=A(AVp) =A2Vp, s, Ve = A*Vo. 


In order to compute V; we need to compute only A* since V, is known. Straight mul- 
tiplication of A with A for a total of 10 385 times is not an easy process. We will use the 
property that the eigenvalues of A* are the k-th powers of the eigenvalues of A, shar- 
ing the same eigenvectors of A. Let us compute the eigenvalues and the eigenvectors 
of A. 


|A-AI|=0 > [^ 47? 
-À 
> M-A-120 
1+ v5 1- v5 


À= y A=, 
> ^ 2 2 
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An eigenvector corresponding to A, = E is given by 


qos 1 ]fx]_ fo 
a |ia] lol = 
| 1 pP X2 0 
1+ v5 
1= > X,=1, or 
2 
1+ V5 
X= kd = H 
1 1 
is an eigenvector. Similarly an eigenvector corresponding to À, = Lys is given by 
1- v5 
x-HP1- 5. 
1 1 
Let 

À A 
-(X X = 1 2 

Q = (X, Xj) É d > 

gis a | 1 ai 

(A-A) [-1 A 
Therefore 
A- A, AJl 0 1 1 -A 
1 1:10 Ajla- [-1 AJ. 


Since A and A* share the same eigenvectors we have 
1 h a | | | 1 d 


k = ———————— 
(Q-A)|1 1|[|o A]|-c1 A 
E 1 E aye AA | 
(A-A) | AK-A% -AAK + A,Ak 
Hence 
1 Ak — Ak 
V, - AK, ren 1 2 
Fo 9 Ay = Ay) | A -A | 
x pn ose [nad 
TE a I OC LED 


Therefore 
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Since [E59] <1, Ak will approach zero when k is large. 


14 v5 


jim hot EAS ^ 1.618. (5.1.4) 
>00 k 


Evidently, one has to take only powers of A, when k is large. That is, F, « zx for 


14-5 
2 


large k. This number 
many places. 
One general observation that can be made is that we have an equation of the type 


is known as the “golden ratio" which appears in nature at 


V, = A*V, = QAQ Vo 


for an n x n matrix A where A = diag(A,,...,A,,) and Q is the matrix of eigenvectors of 
A, assuming |Q| + 0. Then setting OV, =C, C' = (c, ...,c,) we have 


AK = (Me, ... Ae, '. 
If X;, ..., X, are the eigenvectors of A, constituting Q = (X,, ..., X,), then 
Ate 


V, = Q(AFQY,) = (t... Xn) 
Aca 


= C,AKX, +- +.¢,AKX, 


which is a linear function of AFX;, i= 1,...,n or a linear combination of the so called 
pure solutions AKX;. 


Example 5.1.1. Suppose that a system is growing in the following fashion. The first 
stage size plus 3 times the second stage size plus the third stage size is the fourth stage 
size. Let Fy = 0, F; = 1, F, =1 be the initial conditions. Then 


F;=0+3(1)+1=4, F,=14+3(1)+4=7, 
F, =1+3(4)+7=19, 


and so on. Then for any k we have 

Fy + 3 Feat + Fra = Fras: 
Compute F% for k = 100. 
Solution 5.1.1. Consider the following set of equations: 


Fy + 3F haa + Fra = Fras 


Frag = Frag 
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Fk =F > 


1 3 1 Fiji? Frag 
1 0 0 Feri | = | Fra 
010/]|F, E 


Fa» 131 
Let U; = [Fe and A= [109]. Then we have 
Fk 


AU, = Uk+1 and Uo = F, = 


or Fr 


AUg = Ui, U, - A?Us, 
Then 
U, = AFU,. 


Let us compute the eigenvalues of A. Consider the equation 


LA 3 1 
|A-Al=0 >] 1 -A oļ=0 
0 1 4 


> -A 1344120. 


Obviously À = -1 is one root. Dividing —A? +A? + 3A +1 by A + 1 we have —? + 24 + 1. The 
other two roots are [1+ V2]. Then A, =-1, Ay =1+ v2, A3 =1- v2 are the roots. Let us 
compute some eigenvectors corresponding to these roots. For A = -1 


(A-A,DX=0 > 


2-33: 31 |G, 0 1 
1 1 0/|/x;|2|0| > X;-|-1 
o 1 Xe 0 1 


is one vector. For A, 2 14 v2 


(A-A,DX=0 > 


-v2 3 1 Xi 0 
1 -1-v2 0 x|=|0| => 
0 1 -1- V2| bs 0 
342N2 
Xo 1+ v2 


330 —— 5 Some applications of matrices and determinants 


is one vector. For À = 1- V2 


(A-ADX-0 = 


v2 3 1 Xi 0 
1 -1+ v2 0 x,|2]0] 2 
0 1 -1+ 2] |x% 0 
3-222 
X3=| 1- v2 
1 


is one vector. Let 


1 34+2V2 3-2v2 
Qz(X,X,X3)- -1 1«v2 1-421, 


1 1 1 
i 2 -4 -2 
Obs 42-1 2-42 3-2v2|. 
-42-1 2«N2 34242 
Therefore 
U, - AFU, 
(-1)* 0 0 1 
-Q| o (1+ v2 0 Q|1 
0 0 a - v2)k 0 
But 
1 "a 
Q7 1 K^ 1 
0 1 
and 


2(-"1 + (3 + 2V2)(1+ V2 + (3 - 242) - V2 
Uy = - 2(-1)*? + (1+ VÐ + (1 - v2) 
2-1)! + (1+ V2 + (1 - V2 


When k > œo we have (1 - v2) 5 0. Thus for k large a good approximation to U;, is 
the following: 


2(-0? + (3 + 2V2)(14+ VJK Fi 
U,«-| 2C0*«ü-v2"* | =) Fpa 
2(-DF? + (1 + ¥2)* Fy 
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Hence for k = 100 
1 1 
Figg = re + (1+ V2)°], Fio zt (1+ v2)01], 


Fio = jl + 34 22)0 + N2)100]. 


5.1.2 Population growth 


Consider, for example competing populations of foxes and rabbits in a given region. If 
there is no rabbit available to eat the foxes die out. If rabbits are available then for every 
kill the population of foxes has a chance of increasing. Suppose that the observations 
are made at the end of every six months, call them stages 0,1, 2, ... where stage 0 means 
the starting number. Let F; and R; denote the fox and rabbit populations at stage i. 
Suppose that the growth of fox population is governed by the difference equation 


Fi = 0.6F; =p 0.2R;. 


Left alone the rabbits multiply. Thus the rabbit population is influenced by the natural 
growth minus the ones killed by the foxes. Suppose that the rabbit population is given 
by the equation 


Riz = 1.5R; - pF; 


where p is some number. We will look at the problem for various values of p. Suppose 
that the initial populations of foxes and rabbits are 10 and 100 respectively. Let us 


denote by 
youl Po\af 10 UNE 
9 (Ro) M00/7 71 RJ 


Then the above difference equations can be written as 


Gene Pale) 2 
Rin -P 15/\Ri 


0.6 0.2 
Xizi = AX;, A = è 
-p 15 
Thus 
X,2AX, X,=AX,=4A?Xo, 2, XQ2A*XS. 


For example, at the first observation period the population sizes are given by 


gogo f 06° 02 10 
1 779 -p 15/X100/. 
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For example, for p = 1, the numbers are 
F, = (0.6)(10) + (0.2)(100) = 26 
and 
R, = -1(10) + 1.5(100) = 140. 


Let us see what happens in the second stage with the same p, that is for k 2 2, p=1. 
This can be computed either from the first stage values and by using A or from the 
initial values and by using A’. That is, 


P» aue to aai 
R, Ro Rh 
(06 02)( 26) (39V LR isa 
-1 15/140 184 


Note that for p = 1 the fox population and the rabbit population will explode eventu- 
ally. Let us see what happens if p - 5. Then 


0.6 0.2\/ 10 26 
X, - AX, = - , F,=26, R,-100. 
-5 15/100) (100 


X = A’Xy = AX, = ee , F,=35.6, R,-20. 
Note that at the next stage the rabbits will disappear and from then on the fox popu- 
lation will start decreasing at each stage. 

Growths of interdependent species of animals, insects, plants and so on are gov- 
erned by difference equations of the above types. If there are three competing pop- 
ulations involved then the coefficient matrix A will be 3 x 3 and if there are n such 
populations then A will be n x n, n > 1. The long-term behavior of the populations can 
be studied by looking at the eigenvalues of A because when A is representable as 


A-QDQ, D-diag(A,...,À,) = AX =QDKq" 


where A,, ...,A,, are the eigenvalues of the n x n matrix A. Then AK — 0 as k ^ co when 
JA] < 1 and AK 2 co for A» 1 as k 5 oo. Thus the eventual extinction or explosion or 
stability of the populations is decided by the eigenvalues of A. 


5.1.3 Differential equations and their solutions 


Consider a system of total differential equations of the linear homogeneous type with 
constant coefficients. Suppose that a supermarket has barrels of almonds and pecans 
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(two competing types of nuts as far as demand is concerned). Let u denote the amount 
of stock, in kilograms (kg) of almonds and v that of pecans. The store fills up the bar- 
rels according to the sales. The store finds that the rate of change of u over time is a 
linear function of u and v, so also the rate of change of v over time t. Suppose that the 
following are the equations. 


» =2U+v 
T =u+2v 
which means 
dw u 2 1 
—— -AW, W= , A= : ME 
dt M k | Oe) 


At the start of the observations, t = 0, suppose that the stock is u = 500 kg and v = 
200 kg. If W is the vector W = [\'] then we say that the initial value of W, denoted by 
Wo, is Wo = [306 |. We want to solve (5.1.5) with this initial value. The differential equa- 
tions in (5.1.5) are linear and homogeneous in u and v with u and v having constant 
(free of t) coefficients. The method that we will describe here will work for n equations 
in n variables u4, ... , Up, where each is a function of another independent variable such 
ast, u; =u,(t), i=1,...,n, and when the right sides are linear homogeneous with con- 
stant coefficients. For simplicity we consider only a two variables case. 
If there was only one equation in one variable of the type in (5.1.5) then the equa- 
tion would be of the form 
du _ 
ae 
where a is a known number. Then the solution is 


au 


u=e“u, ifu-ugatt-O (initial value). (5.1.6) 


Then in the case of two equations as in (5.1.5) we can search for solutions of the type 
in (5.1.6). Let us assume that 


Ai A 


u=e"x, and v=e"x,, 


z-(3) (5.1.7) 
X2 


for some unknown 4, the same A for both u and v, x, and x, are some parameters free 


of t. Substituting these in (5.1.5) we obtain 
Àt X, 


ty, + 2e! ly, (5.1.8) 


Ae x, = 2e)tx, +e 


Aex = e^ 
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Canceling e% and writing the equations in matrix form we have 


AX - AX, aa 3 kela (5.1.9) 
1 2 X 


The problem reduces to that of finding the eigenvalues and eigenvectors of A. The 
eigenvalues are given by 


|A-Al|=0 = Ei ! |- 


]. “2A 
=> M-44320 
> A421 A, 23. 


An eigenvector corresponding to A = 1 is given by 


ele ol) eee 


is one vector. Corresponding to A, = 3, 


PP alfel lol- 


is one vector. For À = À = 1 a solution for W is 


1 
W; =e"'X =X; =e! | | : (5.1.10) 
For A =A, = 3 a solution for W is 
1 
W, =e"X = elt, = est H ; (5.1.11) 


Any linear function of W, and W, is again a solution for W. Hence a general solution 
for W is 


1 1 
W =c,W, + cjW; - c4ef E t cet H (5.1.12) 


where c, and c, are arbitrary constants. Let us try to choose c, and c; to satisfy the 
initial condition, Wọ = [00 ] for t = 0. Letting t = O in (5.1.12) we have 


alale] oo] 


> C+C =500, -cı +c, =200 
=> C,=350, c,-150. 
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Then the solution to the equation in (5.1.5) is 


W= M = 150ef | l | +3500 H 5 
v -1 1 


u = 150e! +350e%, v= -150e! + 35007. 


Since the exponents are positive, e"! — co as t > oo when b > 0, u and v both increase 


with time. In fact, the eigenvalues A, = 1 and A, = 3, appearing in the exponents, mea- 
sure the rate of growth. This can be noticed from the pure solutions in (5.1.10) and 
(5.1.11). A mixture of these pure solutions is what is given in (5.1.12). If an eigenvalue A 
is positive, as in (5.1.10) and (5.1.11), then e* — oo as t > oo. In this case we say that 
the equations are unstable. If A = 0 the equations are said to be neutrally stable. When 
A<0,e 5 0 as t co. In this case we say that the equations are stable. In our ex- 
ample above, the pure solutions for both A, = 1 and A, = 3, as seen from (5.1.10) and 
(5.1.11), are unstable. 

A slightly more general situation arises if there are some constant coefficients for 
W and ¥ in (5.1.5). 


Example 5.1.2. Solve the following system of differential equations if u and v are func- 
tions of t and when t = 0,u = 100 = uy and v = 200 = vp: 


3— =u+2. (5.1.13) 


Solution 5.1.2. Divide the first equation by 2 and the second equation by 3. Then the 
problem reduces to that in (5.1.5). But if we want to avoid fractions at the beginning 
stage itself of solving the system, or to solve the system as they are in (5.1.13), then we 
look for a solution of the type 


A Ai 


u=e"x,, v=x 

for some A and for some constants x, and x;. [Observe that if the original system of 
equations has some fractional coefficients then multiply the system by appropriate 
numbers to make the coefficients non-fractional. Then the following procedure can 


be applied.] Then the equations in (5.1.13) reduce to the following form: 


Ay 


ex = 2ety, +e 
eM», = ex,  2e)t x. 


Canceling e! and writing 
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we have, 


24 0 2 1 2-2 1 
| ME j| 1 EME 


If this equation has a non-null solution then 


pooh d =0 = 6A2-10A+3=0 
1 2e3 
= n=, hei, 


54 v7 
6 


2- (#4) 1 MEN 
1 2-3(2%7) | x] [oJ 


Let us compute X corresponding to A, and A). For A, = 


One solution for X is 


For A = À one solution for W is 


W, = ehtx, =e | 


and for A = A, the solution for W is 


Thus a general solution for W is W = c,W, + c W, where c, and c, are arbitrary con- 
stants. That is, 


(2) 1 (Er 1 
W= ce 6 1, vw [toe S a a > 
3/73 3 


WIR 


5+V7 5-V7 
u= ces! + cest 
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and 
C E ee 
v-e[-z*—Je's^"-xo6[-z-—Je's'. 
3 3 3 3 
But for t = 0, u = ug = 100 and for t = 0,v = v, = 200. That is, 
100 =c, + €; 
200=¢,(-4+ Y) S (-1- Y7) 
3 3 3 3 
Solving for c; and c; we have 
c,250(01« V7) and c,=50(1- v7). 
Hence the general solution is, 


=50(1+ V7)e et + 500 — Vet 
y -50(1 4 v»(-5 + Aee 50(1 va We Ct 


(Ny 


- 100e t + 1998 8 


Note that the same procedure works if we have m-th order equations of the type 


d", 
Uggm ^ Qu, tt gk 
d™u 
bem -ügu t guy (5.1.14) 


where b, ..., by and aj's are all constants and uj, j = 1, ..., k are functions of t. In this 
case look for a solution of the type u; = ey, j=1,...,k with the same p and x;’s are 
some quantities free of t. Then the left sides of (5.1.14) will contain jJ". PutA = jJ". Then 
the problem reduces to the one in Example 5.1.2. 


Higher order differential equations can also be solved by using the same technique 
as above. In order to illustrate the procedure we will do a simple example here. 


Example 5.1.3. Let y bea function of t and let y', y" ,y™ denote the first order, second 
order and third order derivatives respectively. Solve the following differential equation 


by using eigenvalue method: 


y" = 4y" he 3y! =O 
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Solution 5.1.3. The classical way of doing the problem is to search for an exponential 
solution of the type y = e". Then we get the characteristic equation 


P-4124434202 À20, A21 4-23 


are the solutions of this characteristic equation. Hence the three pure exponential so- 
lutions are e% — 1, et, e*', Now let us do the same problem by using eigenvalues. Let 


u=y', v=y"=u', v's4v-3u and 


y y y! 
W=ļ|u| > W'=]u' |=] y” 
v v! y" 


Writing the above three equations in terms of the vector W and its first derivative we 


have 


1]. (5.1.15) 


Now, compare with (5.1.5). We have a first order system in W. Let y = etx, u- el, 
v = e!x, for some x}, x;, x, free of t. Then substituting in (5.1.15) and canceling e^ the 


equation W' - AW reduces to the form 
x 
AX-AX, X-|x (5.1.16) 
X 


or the problem reduces to an eigenvalue problem. The eigenvalues of A are A, = 0, 
A, = 1, À; = 3. Some eigenvectors corresponding to these eigenvalues are the following: 


A 
=. 


1 
X= 0 > X= 1 > X3= 3 
9 


which gives 


1 e 

W,=e"'X,=|0], W,=ex,=/e'], 
0 ef 
e 
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Thus the pure solutions for y are 1, e! and e?'. A general solution for y is then 
Y=C,+Ce' 4 cze% (5.1.17) 


where c}, C2, C3 are arbitrary constants. 


Exercises 5.1 


5.1.1. Suppose that a population of bacteria colony increases by the following laws. If 
the population at the k-th stage is a, then 


ak + 2044 = i. 


Compute the population size at the 100-th stage (k = 100) if the initial numbers are 
dy = 1, a, =1. 


5.1.2. Prove that every third Fibonacci number is even, starting with 1, 1. 


5.1.3. Show that the Fibonacci sequence Fk + Fy, = Fy42 with Fo = 0, F} = 1is such that 


Pii 1* v5 - golden ratio. 


5.1.4. Consider a sequence Fy + Fy, = Fy42 with Fy 20,F; =a > 0. Show that 
Fia _ lt v5 


= golden ratio. 


5.1.5. Find the limiting value, lim, ,,, W; = lim,_,.. ( x£ for the following systems: 


1 1 
a Uk, = —Up + 2Vj, Up =2, 
(a) ket = 3Uk + 5e Uo 
V al iiy Vo = 1; 
ka 7 3Hk t 5e 054 


(b) Ug, = 0.4uy + 0.3v, Uo= 
Vig = 0.6uy + O.7Vj, Vo= 


5.1.6. For the sequence Fy,» = IF, + F41] compute lim, oo Fy. 


5.1.7. Solve the following systems of differential equations by using matrix methods: 


dv dw 
a — —-2u*v, —-u-«2v-w, ——--v-«2w. 
(a) dt dt dt 


dv dw 
V — =u+2v-w, — =2u+3v+2w, —— -3u-«2v-«w. 
v) dt dt 
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5.1.8. In physics, an oscillating system with unequal masses m, and m; is governed 
by the following system of differential equations: 


du dv 


m qg 77% +Y [ET UY 


where u and v are functions of t. Solve the system for m, = 1, m, = 2and with the initial 
conditions, at t = 0, u = uy = -1andv = vo - 1. 


5.1.9. A system of differential equations governing the diffusion of a chemical in two 
different concentrations is the following: 


du 
dc P EDS 
dv 
dr E Qe) 


where u = u(t) and v = v(t) denote the concentrations. Solve the system when uy and 
vo are the concentrations at t = O respectively. 


v dt dr 
Solve the following systems of differential equations: 


(a) wu ML m=] 


5.1.10. Let 


dt |1 1 1 
dw [1 4 2 
(b) (4 NL m-f]. 


5.1.11. Solve the following system of differential equations by using eigenvalue 
method, if possible, where y is a function of t and primes denote the derivatives. 

(a) y”’-5y”+4y’=0; 

(b) y”+y=0; 

(c) y”=0; 

(d) y”+ay’+by=0, 


where a and b are constants. 


5.1.12. A biologist has found that the owl population u and the mice population v in a 
particular area are governed by the following system of differential equations, where 
t denotes time: 


du 
— = 2u 4 2v 
dt 
dv 
— -u-* 2v. 
dt 
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(a) Solve the system if the initial values are uọ = 2, v; = 100. (b) Is the system, stable, 
neutrally stable, or unstable? (c) What will be the proportions of mice and owl in the 
long run, that is, when t > co? 


5.1.13. In Exercise 5.1.12 suppose that the inventory of owl population u and the mice 
population v are taken at the beginning of every year. At the i-th year let these be u; 
and v; respectively. At the beginning of the observation period let the population sizes 
be ug = 5, V, = 60, that is, for i = 0. Suppose that it is found that the population growth 
is governed by the following difference equations 


Uj, =4U;-2v; and Vi =—5u; + 2v;. 


Compute u; and v; for i = 1,2, 3,00. 


5.1.14. Suppose that two falcons are introduced into the same region of the owl and 
mice habitat of Exercise 5.1.13. Thus the initial population sizes of falcon, owl and mice 
are fg = 2, ug = 5, vg = 60. The falcons kill both owl and mice and the owls kill mice and 
not falcons. Suppose that the difference equations governing the population growth 
are the following: 


fia 7 fi + 2u; - 2vi, 
1444 = 2f; + 3uj - 3vj, 


Vin = -2f; = 3uj + 3V;. 


Compute the population sizes of falcon, owl and mice at i = 1, 2,3, 4, co. 


5.1.15. In Exercise 5.1.14 if the observation period t is continuous and if the rate of 
change of the falcon, owl and mice populations, with respect to t, are governed by the 
differential equations 

df du dv 

—=ftutv, —=-ftut+2v, —=f-u-2v 

dt f dt f dt f 
with the initial populations, at t = 0, respectively fọ = 2, Uo = 5, Vo = 60, then compute 
the eventual, t — co, population sizes of falcon, owl and mice in that region. 


5.2 Jacobians of matrix transformations and functions of matrix 
argument 


This field is very vast and hence we will select a few topics on Jacobians and intro- 
duce some functions of matrix argument for the purpose of illustrating the possible 
applications of matrices and determinants. 


342 —— 5 Some applications of matrices and determinants 


5.2.1 Jacobians of matrix transformations 


Jacobians in some linear and non-linear transformations are discussed in Section 3.3.3 
of Chapter 3. One linear transformation appearing as Exercise 3.3.7 will be restated 
here because it is concerned with a real symmetric matrix. Let X = X' be areal symmet- 
ric p x p matrix of functionally independent (distinct) p(p + 1)/2 real scalar variables 
and let A bea p x p nonsingular matrix of constants. Then, ignoring the sign, we have 
the following result: 

© |A"dX, forX=X', |A| #0 


Y = AXA' => dY = 
|A|??dX, forageneralX, |A| £O. 


The second part for a general X of p? distinct elements follows as a particular case of 
the multi-linear transformation of Section 3.3.3. 


Example 5.2.1. Let X be a p x1 vector of real scalar variables, V a p x p real symmetric 
positive definite matrix of constants. Consider the function 

FX) = c ec Z VG 
where p is a p x 1 vector of constants and c is a constant. If f (X) is a statistical density 


then evaluate c. 


Solution 5.2.1. For f (X) to be a density, two conditions are to be satisfied: (1) f(X) > 0 
for all X; (2) k f O0dX =1 where k denotes the integral over X and dX = A, dx,. Let 
us check the conditions. Since the exponential function cannot be negative, condition 
(1) is satisfied if c > 0. Let us look for the total integral. Since V = V’ > O (positive 
definite) we can write V~ in the form V^! = BB’ for some nonsingular matrix B. Then 


(X - ug) V (X - p) = (X - p) BB' (X - p). 
Let 
Y - B'(X - y) 2 dY = |B'|d(X - p) = |B'|dX 


since 4 is a constant vector, d(X - u) = dX, and |B’ | = |B| from property (i) of Section 3.3. 
But 


IV1| = [BB'| = [B||B'| = B? = |V|? = |V[2. 


Now, consider the integral 
| f (X)dX = e| e- 50-0 V*Q-) qx 
X X 


-c|V|? | e-;YYdy 
Y 
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where Y = B' (X - u), V! = BB’. But Y' Y = yj +--- + y; where y; is the j-th element in Y. 
Thus, the integral reduces to the form 


P (co 
| eraf e» dy; 
Y j=l 77% 
p 
= | | V27 = Qn». 
j=l 


This integral is evaluated by using a gamma integral 


99 2 o9 2 
| e" du-2| e" du 
-00 0 
due to the property of an even function, provided ie e™ du is convergent. u? =z > 


ii E : 
u =z? since u > 0 and du = iz 2 dz. From gamma integral, one has 


[i xneax =T(a), R(a)>0 
0 


and I(J) = v7. Hence i e» dy, = V2n. That is, 


Poo 1,2 
I] | e i dy; = (ny. 
ja “70° 


For f (X) to be a statistical density, the total integral has to be unity which means that 
for 


1 
CS 
Qv] 


this f (X) is a density. Then 


e-2U-V (Xp) 
f(X)= Omp , V>0 
is a density and it called the p-variate nonsingular normal or Gaussian density, for 
-00 < Xj < 00,—00 < Hj < co, X! = (3, ...,X5), H” = (My ++» Mp) V > O. It is usually written 
as X ~ Np (u, V). [X is distributed as a p-variate real Gaussian or normal density with 
the parameters p (the mean value vector) and V (the covariance matrix).] This is the 


fundamental density in multivariate statistical analysis. 


Example 5.2.2. Show that if X ~ Nu, V),V > O then the mean value of X or the ex- 
pected value of X or E(X) is u and V is the covariance matrix of X. 
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Solution 5.2.2. The real nonsingular p-variate Gaussian density is given by 

ae 5(X—p)! V QC) 

fO = prve 

Then the expected value of X, denoted by E(X), is given by the integral 

E(X) = | Xf (X)dX. 

X 
For convenience, let us write X = (X - u) +u and Y = B'(X - u), V ! = BB'. Then (X -j) = 
1 

(B') !Y,dY = |B|dX = |V|?dX. That is 
(py! 
1 


|V]? 


But from Example 5.2.1, f f(X)dX = 1 and each integral in the second term is zero 


EQQ =u | rooax + J. Ye: gy. 


because each element in Ye-?Y'Y is an odd function of the type ye 2" * where Y - 


y 
| ; | . Then the integral over y; will be of the following form, for example, for j = 1: 
p 


co Dp 
Y —00 j=1 


But the integral over y, is zero due to y,e ?» i being odd and the integral existing. The 
second factor in (a) above is only a finite constant, namely V2” Bic Thus, the integral 
over each element in Y will be zero. Hence E(X) = p. Now, consider the covariance 
of X. By definition 
Cov(X) = E{[X - EQO][X - E(X)]'} = E[(X -WX -w'] 
-E((B')  YY'B] 


for Y = B'(X,) > dY = |V|-?dX. 


1 
Cov(X) = eif Eipre, |VI? is canceled. 

yı 

YY'=]| : [Dy yp] 
Yp 
2 

yi = Wp 

Ypi = Vp 


Integrals for the diagonal elements, say the first diagonal element, is of the form 


| ye May, x | «| e 30399 dy, A = A dyp. (i) 
co y: yi 


= 5 5 
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Consider, 


9s, 2 -1y S" 2.-1y "m 
J|. vie tria -2 | yeay, (i) 
-00 0 

* 9 -1y r : I0. 1, —1 " : 
since yje^ ?" is and even function. Put u = 5yj = vV25u'?du = dy, because y; in the 
integral on the right in (ii) is positive. Now, the right side of (ii) becomes 


V2 | uite du- vor(5) = V2yn = V27. 


This means the first diagonal element in the integral in (i), namely, 


ly! p 99 1,2 p 
| yje* tay = vai] May) = (21)? 
Y j=2 


—oo 


since (ies ey} dy; = v2m for each j = 2,...,p. Thus, all diagonal elements in 
|] Av? Je^i YdY are of the form (27)5. Now, consider one non-diagonal element, 


say the first row second column element. This will be of the form 


| ower Yay -a| | yiye 20199 dy, ^ dy; (iii) 
where 
P ro mrs 
A-]I[. e o, v) 
j=3 7-00 


In (iii) the integrand for each of y, and y, is an odd function and since | yje BA dy; < 
co (finite) the integrals over y, and y; in (iii) are zeros due to odd function property. 
Note that (iv) only produces a finite quantity, namely, (V27)?-?. Thus, each non- 
diagonal element in the integral Im YY'e-:Y dy is zero, or one can write 


1 
(27)? 


| (YY')e3(* Day =1 
Y 


where I is the identity matrix of order p. Then the covariance matrix becomes 
Cov(X) = (B') IB! = (B') !B! = (BB') ! = V 
or the parameter V in the density is the covariance matrix there. 


Example 5.2.3. If the following function f (X), where X is m x n matrix of mn distinct 
real scalar variables, A is m x m and B is n x n real positive definite constant matrices: 


F(X) =ce tr(AXBX') 


for X = (Xj) -00 < Xj < co,i = 1,...,m,j = 1,...,n where tr(-) denotes the trace of the 


matrix (-), is a statistical density then evaluate c. 
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Solution 5.2.3. Since A = A’ > O,B = B’ > O we can write A = CC', B= GG' where C 
and G are nonsingular matrices. Then 


tr(AXBX’) = tr(CC'XGG'X"') = tr(C'XGG'X'C) 
=tr(YY'), Y=C'XG. 
In writing this we have used the property that for two matrices P and Q, whenever PQ 


and QP are defined, tr(PQ) = tr(QP) where PQ need not be equal to QP. From Section 3.3 
we have 


Y=C'XG > dY -|C'|" |G|"dx. 
Note that |A| = |CC'| = |C||C"| = |CI? or |C] = |A|2. Similarly, |G| = |B|?. Therefore 
dY = |A|? |B|? dX. (i) 


But for any matrix Z, tr(ZZ') = sum of squares of all elements in Z. Hence 


or 


m n co 
| e-ieo^ay =T TT] | ec Vidyy = (V2). 
Y oo 


i=1 j=1 -~ 
The total integral in f(X) being one implies that 


|A|? |B|? 
c= ~ 
Qn)? 


or 


F(X) = |A]2 |B] 2 e- } tr(AXBX') 


mn 


for X = 6), —00 < Xj < co foralliandj, A » O, B » O,isknownas thereal matrix-variate 
Gaussian or normal density. In the above case E(X) = O (null). If we replace X by X - M, 
where M is an m x n constant matrix then E(X) - M and there will be three parameter 


matrices A, B, M. 


Jacobians in one non-linear transformation in the form of a triangular reduction 
was considered in Section 3.3. Now let us consider a few more non-linear transfor- 
mations. A very important non-linear transformation is the nonsingular matrix going 
to its inverse. Let X be a nonsingular p x p matrix and let Y = X ! be its regular in- 
verse. Then what is the relationship between dY and dX? This can be achieved by the 
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following procedure. Let X = (xj). We have XX -1 =], the p x p identity matrix. If we 
differentiate both sides with respect to some 0 we get 
ð ð 
0- (<x) x( £x) 
ao) ' 98 


x Ca à w y 


Hence if we consider the differentials dx; and the matrix of differentials, denoted by 
(dX) = (dx;) then we have the relationship 


(dX 1) = -X (dX)X !. 
Let V = (vj) = (dX !) and U = (uj) = (dX) then we have 
V =-X UX". (a) 


Noe that in (a) only U and V contains differentials and X^! does not contain any dif- 
ferential or X^! acts as a constant matrix. Now, by taking the wedge product of these 
differentials and using property (iv) of Section 3.3 we have 


_ |IX dU fora general real X 
— | iX eaU for X =X’ and real. 


Thus we have the following result: 


(ii) 


X| dx fora general real X 


Y=X! 5 dY- 
|X|") dX for X =X’ and real. 


Example 5.2.4. A real p x p matrix-variate gamma density is given by 


pti 
IX 2 ete X-2X'50, R(a)> 22 
fX) = ne "' Au (5.2.1) 
0, elsewhere 
where the real matrix-variate gamma function is given by 
_ „PED 1 p-1 p-1 
D,(a)-7 * r(ar(a- ;) es r(a- >), R(a) > am: (5.2.2) 


Evaluate the density of Y = X71. 


Solution 5.2.4. Here X = X' > O and hence dY = |X|-?* dX or dX = |Xj?*!dY = 
|Y|-?*PdY. Since the transformation X to X^! is one-to-one if the density of Y is 
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denoted by g(Y) then g(Y)dY - f(X)dX. That is, 


gv) er v yp eo 


Y -a- Pj g- t(Y 7) -1 
iH : , R(a)» P= 
T (a) 2 
we ew Y YO, Ra) > 2 
- La ^ > 0, R(a) > ^ 
0, elsewhere. 


5.2.2 Functions of matrix argument 


A real matrix-variate gamma function was introduced in Section 3.3.4 of Chapter 3. 
A corresponding gamma function in the complex space is defined in terms of a Her- 
mitian positive definite matrix. [A Hermitian positive definite matrix means a matrix 
X which is Hermitian, X = X' where X^ denotes the conjugate transpose of X and all 
the eigenvalues of X are positive. Note that Hermitian means that all eigenvalues will 
be real.] Let X be Hermitian positive definite and p x p, denoted by X » O. 


Definition 5.2.1 (4 complex matrix-variate gamma). Notation T,(a): Itis defined as 


p(p-1) 


2 I(a)I(a-1)---I(a-p«1),R(32) »p-1 (5.2.3) 


I, (a) 27 
and it has the integral representation 


f,(a) = J. : fret og. (5.24) 
> 


A complex matrix-variate gamma density is associated with a complex matrix-variate 
gamma and a Hermitian positive definite matrix random variable. 


Definition 5.2.2 (A complex matrix-variate gamma density). 


Be X S0, R(a)>p-1 
"ELS 
O, elsewhere 
where B = B* > Ois a constant matrix, free of X, and all matrices are p x p. 


Another basic matrix-variate function is the beta function. They are associated 
with type-1, and type2 (also known as inverted) beta integrals. 
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Definition 5.2.3 (A p x p matrix-variate beta in the real case). Notation B, (a, B): It is 
defined as 


T, (GT, (B) 


p-1 p-1 
T+A , R(a)> Ex R(B) > EE (5.2.5) 


B, (a, B) = 


It has the following two types of integral representations: 


IX- 7 -X-F dX (type beta integral), 
«I 


«X 


Bye.) = | 


B, (a, p) = | |Y|4- T+ Y €*P — (type2 beta integral). 
Y>O 


Here Jocy., means the integral over all p x p matrices X such that X > 0,I - X > O or 
the symmetric matrix X is positive definite and I — X is also positive definite, where I is 
the p x p identity matrix. This also means that all the eigenvalues of X are in the open 
interval 0 « A « 1 where A is an eigenvalue. When X is symmetric or Hermitian we can 


show that all eigenvalues are real. 


Definition 5.2.4 (A p x p matrix-variate beta density, real case). 


py yog pol P-L tyne- 
F(X) = Ba > RA > J, RIB) > (type) (52.6) 
0, elsewhere; 
eer Yy O, R(a) > 251, R(B) > 2 (type2 
f,(Y) = Bgm. «4 F POROS TRS T Cope?) (5.2.7) 
0, elsewhere. 


Corresponding densities in the complex case can also be defined analogous to 
the real case. More on Jacobians of matrix transformations and functions of matrix 
argument can be read from the books [3, 2]. 


Example 5.2.5. Show that the two types of integrals defining the p x p matrix-variate 
beta functions in the real case give rise to the same quantity. 


Solution 5.2.5. Consider the type-1 beta integral. Call it I,. Then 


pH 


n=] Ixi- 5 ur - xp- 5 ax. 
O«X«I 


Make the transformation 


XeqiYyy3Yqay3ie(Y3yo»gay ys sae 
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where Y? and (I+ Y) denote the symmetric positive definite square roots of the sym- 
metric positive definite matrices Y and I + Y respectively. Note that for defining defi- 
niteness the matrix has to be symmetric when real and Hermitian when in the complex 
domain. The word “symmetric” is repeated in order to stress the point. [There are no 
non-symmetric positive definite matrices.] Applying the Jacobian in (ii) above twice 
we have 


X-(«Y1)! = dX = |I + Y | Py eday 


=| + Y o? gy. 
Note that 
XIF sA yop F yp gp yF 
I-X? =I- (I+ yF 

=+ YE y 

=+ Y, 
Therefore 

Ix? MI- XF ax = YF ur vp Pay. 

Hence 


L= | IYI F + YP ay =, 
Y>O 
and hence the result. 


We can also see two results by making the transformations U = I - X and V = 
(I+ Y) ! in L. These are the following: 


[| xr ur xp ax 
O«X«I 
£ | iuf 3 ue" au 
O«U«I 
-1 -1 
=B,(a,B), RAP, «st 
or the parameters a and f are interchanged. Similarly 
| lye r+ Y Bay 
Y>O 
= | WIP ra vipemy 
V>O 
-1 


" p-1 p-1 
= By(a. p), R(a) > y R(B) > $c 
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Another very important Jacobian is obtained when a general real p x n matrix X, 
n > p, is uniquely represented in terms of a lower triangular matrix with positive diag- 
onal elements T = (tj), tj = 0,i « j,tj > 0,j =1,...,p and a semi-orthonormal matrix U 
where U is p x n, UU’ = I. That is, 


X-TU, np. 


It can be shown, going through the steps in establishing property (ii) above, that: 
(ii) x«m = ax- ffy Jar ac. n2p 


where 
j ip 
dG = A A uj (duy) 
i=1i<j=1 


where (du;) is the j-th column vector of the differentials of U and u; is the i-th col- 
umn vector of U. 


Here U is an element of the Pad manifold V, „n > p, of all semi-orthonormal p x n 


pn 
matrices U, such that UU’ = 


Example 5.2.6. Evaluate f, dG. 
pn 


Solution 5.2.6. Let X be a p xn matrix of np distinct real scalar variables, n > p. Con- 
sider the integral F e- 'XX)dX. Note that tr(XX’) is the sum of squares of all the np 
elements in X. Then integrating directly we have 


fe e "ZX dX = mae eidxj=n?. (a) 


i=1 j=1 


Let us apply the transformation in (iii). Note that XX' = TUU' T' = TT’ and 
X=TU => dX- (Ls laras 


Then 


jea]: [s je zar | ac. (b) 


np 


But 


© eta _1 3-1 Yid _ 42 
po RP] 47 EINE 158 
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Then 


D D 4 
| [TT ferer™ar- 1 (=) 
T jel 2P 2 


1 n 
= p(5) 
see the notation from (5.2.2). Comparing (a) and (b) above we have the following re- 
sult: 


(iv) | dG = ,n>p. (5.2.8) 
V, 


Thus, the integral of dG, the volume elements, over the Stiefel manifold gives the result 
in (iv). This is a very important result in the theory of functions of matrix argument 
where the integral is over the full Stiefel manifold (not in any subset there). When 
n = p we have the full orthogonal group denoted by O(p). Then putting n = p in (iv) we 
have another very important result: 


(5.2.9) 


(v) um d6-- 


Exercises 5.2 


5.2.1. Show that a real symmetric positive definite matrix A can be written as A = BB’ 
where B is nonsingular. 


5.2.2. Show that a real symmetric positive semi-definite matrix A can be written as 
A 2 CC! , where C is a rectangular matrix, and that if C is of full rank then A is positive 
definite. 


5.2.3. Show that a Hermitian positive definite matrix A can be written as A = BB* 
where B is nonsingular. 


5.2.4. Construct the positive definite square root of (a) a positive definite matrix A, 
(b) a Hermitian positive definite matrix A. 


5.2.5. For real positive definite or Hermitian positive definite matrices A and B show 
1 1 
that, denoting the positive definite square roots by A? and B?, 


II — AB| = |I — BA| = |I - A3 BA?| = |I - B? AB?|. 
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5.2.6. Let X, and X, be p x p matrix-variate real gamma variables having densities 
in (5.2.1) with parameters a, and a, respectively. Let X, and X, be statistically inde- 
pendently distributed (the joint density of X, and X, is the product of the individual 
densities of X, and X,). Then show that 


Y=X,+X, (a) 
is a matrix-variate gamma with parameter a, + a); 
Y,-Y3XY3 (b) 
has a matrix-variate type-1 beta density; 
Y,-Y-XY? (c) 


has a matrix-variate type-1 beta density; 


1 


Y; -X, XX ! and Y, =X; 249 ? (d) 
have matrix-variate type- beta densities; 
Y and Y, as well as Y and Y, (e) 
are independently distributed. 


5.2.7. Let X beap x1 real vector random variable and T a p x 1 vector of parameters 
(free of X). Then 


X'T =T'X = tig +- tyx, 


where X' = (x,, ... »Xp) and T' = (t),... sty). Then the expected value of eX , denoted 
by My(T), that is, 


My(T) - Ee] = | e"f ax 


where f (X) is the density of X, is called the moment generating function of X when X is 
continuous. Evaluate the moment generating function of X when X ~ N,(u, V), V > O 
of Example 5.2.1. 


5.2.8. Consider a p xp real positive definite matrix random variable X with the density 
f(X). Let T = (tj) 2 T' > O ty = ti ty = tpi +j, ty = Gj for all i and j, be p x p parameter 
matrix. T is free of X. Then the moment generating function of X, denoted by My(T), 
is given by 
My(T) - Ele" *] = | e T0 F(X) dX, 
X>0 


Evaluate My(T) for (a) the real matrix-variate gamma variable of (5.2.1); (b) the real 
type-2 matrix-variate beta variable of (5.2.7). Does this exist? 
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5.2.9. Consider a p x n matrix X,n > p, with the columns of X independently dis- 
tributed as N,(O,I) of Example 5.2.1. Independently distributed means the den- 
sity of X, denoted by f(X), is available by taking the product of the densities of 
X; ~ N,(0,1),j =1,...,n. Let Y = XX' . With the help of (iii), or otherwise, show that 
the density of Y is a particular case of a matrix-variate gamma density given in (5.2.1). 


5.2.10. Repeat Exercise 5.2.9 if the columns of X are independently distributed as 
N,(O, V), V > O. 


5.2.11. By using moment generating function show that if the p x p matrix X has a 
matrix-variate gamma density of (5.2.1) then every leading sub-matrix of X also hasa 
density in the same family of gamma densities. 


5.2.12. By partitioning matrices and integrating out the remaining variables (not us- 
ing the moment generating function) establish the same result in Exercise 5.2.11. 


5.2.13. Let X = X' and T =T’ be two p x p matrices. Show that 


p p 
tr(XT) = tr(TX) + > > tijXij 
i=l j=l 


5.2.14. For X = X' and p x p construct a p x p matrix T such that T = T' and tr(XT) = 
PP. tx, so that E [e7 79] can act as the Laplace transform ofa real-valued scalar 
function f (X) of X > O where T > O. 


5.3 Some topics from statistics 


In almost all branches of theoretical and applied statistics involving more than one 
random variable (real or complex) vectors, matrices, determinants and the associated 
properties play vital roles. Here we will list a few of those topics for the sake of illus- 
tration. 


5.3.1 Principal components analysis 


In a practical experimental study a scientist may make measurements on hundreds of 
characteristics of a given specimen. For example, if the aim is to identify the skeletal 
remains of 10 individuals and classify them as coming from the some groups (ethnic, 
racial or other) then all characteristics which may have some relevance to the study are 
measured on each skeleton. Initially the experimenter does not know which character- 
istics are relevant. If the experimenter has made measurements on 100 characteristics 
such as the length of thigh bone, dimension of the skull (several measurements), nasal 
cavity and so on then with 100 such characteristics the analysis of the data becomes 
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too involved. Then the idea is to cut down the number of variables (characteristics on 
which measurements are already made). If the aim is classification of a skeleton into 
one of k racial groups then variables which do not have much dispersion (variation) 
are not that important. One of such variables may be sufficient. Hence the variables 
which have more scatter in them (squared scatter is measured in terms of variances) 
are very important. Thus one way of reducing the number of variables involved is to 
look for variables that have larger variances. These are very important variables as far 
as classification is concerned. Since linear functions also contain individual variables 
it is more convenient to look at linear functions (all possible linear functions) and 
select that linear function with maximum variance, second largest variance and so 
on. Such an analysis of variable reduction process is called the principal components 
analysis. 

Let x;, ...,xy be p real scalar random variables with mean value zero. Since vari- 
ance is invariant under relocation of the variables the assumption that the mean value 
is zero can be made without any loss of generality. Consider an arbitrary linear func- 
tion: 


= al _y/ 
U = A4X1 + + AyXp =A’ X=X a, 


a x 
a-f: E d : (5.3.1) 
ap Xp 


where a4, ...,a, are constants (free of x,,...,x,). Then the variance of u, denoted by 


Var(u), is given by 


Var(u) = Var(a' X) = E(a' X)? = E(a' X)(a' X)' 
= E(a' XX' a) = a' [E(XX')]a = a' Va (5.3.2) 


where E denotes the expected value, V is the covariance matrix of the p x 1 vector 
random variable X. [Note that since a'X is a scalar quantity its square (a^ X Y? can also 
be written as it times its transpose or (a' X)? = (a' X)(a' X)! .] Here V = (Vij), v; = Vary), 
vij = Cov(x;,x;) = the covariance between x; and x;. Note that (5.3.2) with unrestricted 
a can go to «co since a' Va > 0 and then maximization does not make sense. Let us 
restrict a to the boundary of a p-sphere of unit radius, that is, a'a = 1. Going through 
Lagrangian multiplier let 


$ - a! Va - (a! a - 1) (5.3.3) 


where A is a Lagrangian multiplier. Differentiating @ partially with respect to a (see 
vector derivatives from Chapter 1) we have 
op 


E - 0 => 2Va-24a-0 = Va - Aa. (5.3.4) 
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Thus a vector a maximizing a’ Va must satisfy (5.3.4). A non-null a satisfying (5.3.4) 
must have its coefficient matrix singular or its determinant zero or |V — AI| = 0. That is, 
ais an eigenvector of V corresponding to the eigenvalue A. The equation 


IV -Al| =0 


has p roots, A, <A, €: < Àp- When the variables x}, ... ,Xy are not linearly dependent 
(no experimenter will include a variable which is linearly dependent on other vari- 
ables under study because no additional information is conveyed by such a variable) 


V is real symmetric positive definite and we have taken a'a = 1. Then from (5.3.4) 


ai 
EDU = f = PE: 
À =a; Va, Va,=Aa,, aja,=1, @= 


Mp) 


where a, is the eigenvector corresponding to the largest eigenvalue of V. Thus the first 
principal component is 


Uy = OLX = Ay Xz 0X5 te + Ap1Xp 


with variance, Var(u,) = aj Va, = A,. Now take the second largest eigenvalue A, and a 
corresponding eigenvector a, such that aja; = 1. Then 


Uy = Q)X = 045X, + 05,X) + +++ + 3X 


is the second principal component with the variance A, and so on. Since V = V’, real 
symmetric, the eigenvectors for different eigenvalues will be orthogonal. Thus the 
principal components constructed at the r-th stage will be mutually orthogonal to all 
others (the coefficient vectors are mutually orthogonal). 


Example 5.3.1. Show that the following V can represent a covariance matrix and 
compute the principal components where 


PN O 
N e e 


Solution 5.3.1. In order for V to be the covariance matrix of some 3 x 1 real vector 
random variable, V must be symmetric and at least positive semi-definite. Let us check 
the leading minors. 


2 0 


2>0, 
0 2 


2 0 1 
|-4»o. O 2 1|=4>0 
1 1 2 


and V = V'. Hence V = V' > 0 (symmetric positive definite). In order to compute the 
principal components one needs to compute the eigenvalues of V. Consider 
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|V-A|]|202]| 0 2-A 1 |=0 
1 1 2-A 
=> Q-A(?-44«2)20 
> Ap=24V2, À-2, À =2- 2 
are the three roots. Now, consider the largest one, namely A, = 2 + V2. An eigenvector 
corresponding to A, is given by 


-v2 1 Ud Ee 0 
(A-ADX=O>] 0 -v2 1 |}]x]=]0 
1 1 -v2} |% 0 
1 
=> X,=] 1}. 
v2 
Normalizing it we have 
1 
2 
a=| i |, ala,=1 
v2 
2 


is the normalized X,. Hence the first principal component is 


1 1 2 
u;-ajX- 5% + nt 7 X3. 


Now take A, = 2. (A - A;I)X = O gives an 


ES 
1 vi 
X;-|-1]| andthen a;- x 
0 0 
and therefore the second principal component is 
1 1 
u5 = ax = —X, — — X). 
2 2 42 1 v2 2 
Now take the third eigenvalue A; and consider (A - A31I)X = O. This gives 
1 
1 2 
X3=|{ 1 andthen a3= 5 
-v2 _ 42 
2 
Therefore the third principal component is 
uz aX - 1x ty 2, 
see eee os le T re 


Note that aja, = 0, aja; = 0, aba, = 0, ala; 21, i=1,2,3. Var(u,) = a} Va, =A, =2+ v2, 
Var(u>) = 2, Var(u3) = 2- v2. Var(u,) > Var(u5) > Var(us). 
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Observe that if one or more eigenvalues are zeros then the corresponding princi- 
pal components need not be calculated since when À - 0, variance - 0 and then the 
variable is degenerate (a constant) having no scatter. No A can be negative since the 
covariance matrix is at least positive semi-definite. Computations need be carried out 
only as long as there are positive eigenvalues above a preassigned threshold number. 
The variance falling below which (eigenvalues below the threshold number) may not 
be of any interest to the experimenter. In practice what is done is to have a cutoff point 
for À; = Var(uj), j= 1,...,p and include all principal components with Àj bigger than or 
equal to the cutoff point in the study. 

When theoretical knowledge about the variables x, ... , x, is not available then in- 
stead of V we consider the sample covariance matrix S which is an estimate of V and 
work with S. We get estimates of the principal components and their variances. One 
drawback ofthe procedure of principal components analysis is that our initial aim was 
to reduce the number of variables p when p is large. In order to apply the above pro- 
cedure we need the eigenvalues and eigenvectors of a p x p matrix with p large. Hence 
itis questionable whether computation-wise anything tangible is achieved unless the 
eigenvalues are so far apart that the number of principal components is only a handful 
when p, in fact, is really large. Since the problem is relevant only when p is large an 
illustrative example here is not feasible. What is done in Example 5.3.1 is to illustrate 
the steps. 


5.3.2 Regression analysis and model building 


One of the frequent activities in applied statistics, econometrics and other areas is to 
predict a variable by either observing other variables or by prefixing other variables. 
Let us call the variable to be predicted as the dependent variable y and the variables 
which are used, say x, ...,x,, to predict y as free (not a proper term in this respect) 
variables. As examples would be (1) y = market price of the stock of a particular prod- 
uct, x, = market demand for that product, x, = price of a competing product, x3 = 
amount demanded of the company through law suits against the company, and so 
on OF X4,X», ... are factors which have some relevance on y, (2) y = rate of inflation, 
X, = unit price of gasoline, x, = unit price of staple foods, x3 = rent, and so on, (3) y = 
weight of a beef cow, x, - the age, x; - amount of food item 1 consumed, x; - amount 
of green fodder consumed, and so on. 

We can prove that the best predictor, best in the minimum mean square sense, 
is the conditional expectation E(y|x,,...,x,) where y is the variable to be predicted 
and x,,...,X, are the free variables or the variables to be preassigned. For obtaining 
the best predictor, one needs the conditional distribution of y at given x,..., x, and 
the conditional expectation E(y|x,,...,x;,) existing. If we do not have the conditional 
distribution then what one can do is only to guess the nature of E(y|x,,...,x,) and as- 
sumea functional form. Then try to estimate that function. If we do not have the condi- 
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tional distribution and if we suspect that the conditional expectation is a linear func- 
tion of the conditioned variables x,, ..., x, then we may take a model E(y|x,,...,x;,) = 
Qo + A,X, + +++ + A,X, a general linear function. 

Suppose we assume that the expected value of y at preassigned values of x, ... , xj, 
is a linear function of the type, 


E(Y|X; ..., X4) = Ao + apa + +++ + AX (5.3.5) 


where E(y|(-)) denotes the conditional expectation of y given (-), x,,..., x; are preas- 
signed, and hence known, and the unknowns are the coefficients ag, ..., ap. Hence if 
(5.3.5) is treated as a model that we are setting up then linearity or nonlinearity are 
decided by the linearity of the unknowns, ap, a,,...,a, in the model. If it is treated 
as a predictor function of x, ..., x, then linearity or nonlinearity is decided as a func- 
tion of x, ..., x. This is the essential difference between a predictor function and a 
model set up to estimate the predictor function. Since the equation (5.3.5) is linear in 
the unknowns we say that we have a linear model for y in (5.3.5). If we had a regression 
function (conditional expectation of y given x4, ... , Xx, which is the best predictor of y, 
best in the minimum mean square sense) of the form 


E(y|x,, ...,X4) = agaj e (eat tao (5.3.6) 


then we have a nonlinear predictor of y since the right side in (5.3.6) is nonlinear in 
X,, ...,Xj, and if (5.3.6) is treated as a model set up to estimate a regression function 
then the model is nonlinear because it is nonlinear in the unknowns aj, ..., dx. 

Consider a linear model of the regression type such as the one in (5.3.5). Our first 
aim is to estimate the unknowns aj, ...,a;.. One distribution-free method (a method 
that does not depend on the statistical distributions of the variables involved) that is 
frequently used is the method of least squares. This needs some data points, observa- 
tions or preassigned values on (x, ...,x,) and the corresponding observations on y. 
Let the j-th preassigned value on (x; ...,x,) be (Xj, ...,x,j) and the corresponding ob- 
servation on y be y;. Since (5.3.5) is not a mathematical equation we cannot expect 
every data point (x;;, ...,x,;) substituted on the right to give exactly y;. (The model is 
simply assumed. There may or may not be such a linear relationship.) Write 


Yj = Ao + QyXy +++ + AkXkj + ej 


where e; is the error in using ao + a4xj + -++ + ayxy; to estimate y;. Then 
Cj = Vj — Ag — AXyj— ++ aX Jad... 


if there are n data points. Since k + 1 parameters are to be estimated we will take n to 
be at least k +1, n > k +1. In matrix notation we have 
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e - Y - Xp, 
€ yı ao 
e= > Y i : > p = > 
en Yn ak 
1 X, Xia 
X= : 
1 Xin Xkn 


where e and Y are n x1, X is n x (k +1) and f is (k + 1) x 1. The error sum of squares is 
then 


ejt +e, =e'e = (Y - XB)' (Y - Xp). (5.3.7) 


If the parameters in f are estimated by minimizing the error sum of squares then the 
method is called the method of least squares. Differentiating (5.3.7) with respect to f 
and equating to a null vector we have (see the vector derivatives from Chapter 1) 


qee=0 = -2X'(Y - Xp) - O0 


= X'XB-x'Y (5.3.8) 


where B denotes the estimated f. In the theory of least square analysis the minimizing 
equation in (5.3.8) is called the normal equation (nothing to do with Gaussian or nor- 
mal distribution). If X' X is nonsingular which happens when X is of full rank, that is, 
the rank of X is k - 1« n, then 


B-Qx'x) xv. (5.3.9) 


In a regression type model the final model is going to be used at preassigned points 
(X ..., Xy). Naturally, one would not be taking linearly dependent rows for X. Even 
if X, ..., x are not linear functions of each other when actual observations are made 
on (X,, ...,x,) there is a possibility of near singularity for X'X. In a regression type 
model, more or less one can assume X'X to be nonsingular. There are other models 
such as design models where by the nature of the design itself X' X is singular. Since 
X'Y in (5.3.8) is a linear function of the columns of X’ and since X’ X is also of the same 
type the linear system in (5.3.8) is always consistent when n > k + 1. The least square 
minimum from (5.3.9), usually denoted by s?, is given by 


s? = (Y - XB)' (Y - XB) = Y' (Y - X) 
since -X' (Y - X B) = 0, normal equations; 
s? = Y'Y -Y'X(X'X) !x'Y - Y'[I- X(X'X) !X']Y. (5.3.10) 
Note that the matrices 


A-I-X(X'X)!X', B-X(X'X) x 
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are idempotent, A = A? and B = B’, and further, AB = O, that is, they are orthogonal to 
each other. If B = O then the least square minimum is 


SScytY 
Thus 


s2 —s? -Y'Y -Y'[I- X(X'X) !x']Y - Y'X(X'X) x'Y 


is the sum of squares due to the presence of the parameter vector f. Comparing the 
relative significance of s? - s? with s? is the basis of testing statistical hypotheses on £. 
If the parameter aj is to be separated then we can consider the vector 
yı =y XX e Xa -Xk | |a e,-e 
: = : d : : [t+ : (5.3.11) 
Yn-Y X1n — X "is Xkn — Xk ak en- ê 
where y = (y, te ky. Xp = IEL Xp ē= 1(ei * + €) where e can be taken to 
be zero without much loss of generality. The least square estimate of f in this case, 
where f! = (a,, ...,a,), will have the same structure as in (5.3.9) but the Y and X are to 
be replaced by Y - Y and X — X respectively, where 


“ei 


] [A & eX 
Pes dee 
d 30. 4 d 


d] 


Example 5.3.2. If the expected value of y at preassigned values of x, and x, is sus- 
pected to be a function of the form a + a,x? + a3x4x; estimate the prediction function 
by the method of least squares and estimate y at (x,, X2) = (2, 1) and at (5, 7) respectively 
by using the following data points: 


x4 01 1 1201 
xs 0 1 -1 1 02 2 
y 2 4 1 1525 
Solution 5.3.2. Writing in matrix notation we have 
2 10 0 
4 1 1 1 
1 ao 1 1 -1 
Y-|1|], B=la l, X=|1 1 -1 
5 a 14 0 
2 10 0 
5 11 2 
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where the columns of X correspond to 1,x7, x,x5; 


E: 
X'X=|8 20 1|, 
1 1 7 
i B955. «0 
Qu RIS -55 48 1], 
-12 1 76 
20 TEE 
X'Y-|31], QU uae 400 
12 703 
Therefore 
. 93 . 400 . 703 


931 400 , 703 


= x X1X5. 
=a Sepe pop D 
The predicted y at (2, 1), denoted by f, is given by 
931 400 703 
y= 2)? 2)(1) = 7.5566. 
y 5A 521 P * 3 0 


Since the point (5, 7) is too far out of the range of observations, based on which the 
model is constructed, it is not reasonable to predict at (5,7) by using this model un- 
less it is certain that the behavior of the function is the same for all points on the 
(x, y)-plane. 


When the model is linear in the parameters (unknowns) the above procedure can 
be adopted but when the model is nonlinear in the parameters one has to go through 
a nonlinear least squares procedure. Many such algorithms are available in the litera- 
ture. One such algorithm may be seen from [6]. 


5.3.3 Design type models 


Various types of models appear in the area of statistical design of experiments. A spe- 
cial case ofa two-way analysis with fixed effect model, applicable in randomized block 
designs is the following: Suppose that a controlled experiment is conducted to study 
the effects of r fertilizers on s varieties of corn, for example 2 fertilizers on 3 varieties 
of corn. Suppose that rs experimental plots of land of the same size which are homo- 
geneous with respect to all known factors of variation are selected and each fertilizer 
is applied to randomly assigned s plots where a particular variety of corn is planted. 
Suppose we have one observation y; (yield of corn) corresponding to each (fertilizer, 
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variety) combination. Then y; can be contributed by some general effect, deviation 
from the general effect due to the i-th fertilizer and the j-th variety, and a random part. 
The simplest model of this type is 


Vy =H +a; + Pj + ei ao ns J= lS (5.3.12) 


where y is the general effect, a; is the deviation from the general effect due to the i-th 
fertilizer, B; is the deviation from the general effect due to the j-th variety and e; is 
the random part which is the sum total contributions from all unknown factors which 
could not be controlled through the design. The model in (5.3.12), when yp, aj's and 
Bj's are fixed, but unknown, and ey's are random, is known as a linear, additive, fixed 
effect two-way classification model without interaction (one of the simplest models 
one can consider under this situation). Writing (5.3.12) in matrix notation we have 


Y-Xp«e, 
where 
Yu en 
i M 
Yis a, C15 
Yn : en 
Y = > B = [^i > e- E 
Yas f. es 
Yn B en 
E S * 
Yrs Crs 
1 1 0 0 1 0 0 
1 1 0 0 1 0 
1 1 O0 0 0 O 1 
Xx- 1 1 O 1 0 (0) 
"noB 0 1 0 0 1 0 
1 0 1 0 0 O0 1 
1 o oO... 0 1 O .. 1 


Since the sum of the last r s columns is equal to the first column the matrix X is not 
of full rank. This is due to the design itself. The matrix X here is called the design ma- 
trix. Thus X' X is singular. Hence if we apply matrix method we could only come to an 
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equation corresponding to (5.3.8) and then go for other methods of solving that equa- 
tion or compute a g-inverse (generalized inverse) (X’X) of X'X so that one solution 
of the equation corresponding to (5.3.8) is 


B-(X'X) x'v. 


The theory of g-inverses is already available for dealing with such problems. But in 
the fields of design of experiments, analysis of variance and analysis of covariance, 
matrix methods will be less efficient compared to computing the error sum of squares 
as a sum, computing the derivatives directly and then solving the resulting system of 
normal equations one by one. Hence we will not elaborate on this topic any further. 


5.3.4 Canonical correlation analysis 


In Section 5.3.2 we dealt with the problem of predicting one real scalar variable by 
using one set of variables. We generalize this problem here. We would like to predict 
real scalar variables in one set by using another set of real scalar variables. If the set to 
be predicted contains only one variable then it is the case of Section 5.3.2. Instead of 
treating them as two sets of variables we consider all possible linear functions in each 
set. Then usea principle of maximizing a measure of scale-free joint dispersion known 
as correlation (nothing to do with relationships, does not measure relationships) for 
constructing such predictors. Let all the variables in the two sets be denoted by 


Xo) x Xp,+1 
x-| | Xa=] i p o=] : | P=Pit+P2 
Pi Xpiep; 
For convenience let us assume that p, < p>. The covariance matrix of X, denoted by V, 
is 
V = E([X - E(X)][X - E(X)]’}. 


Let us partition V conformally with X. That is, 


V= D dl 
Va Vp 


where V4, is p, x p,, V4, = Vj, since V is symmetric. Let us consider arbitrary linear 
functions of X, and Xo. Let u = a'X(y, w = B'X where a! = (a5,..., a) and f! = 
(By, i p, p,) are arbitrary constants. Since correlation is invariant under scaling 
and relocation of the variables involved we can assume, without any loss of generality, 
that Var(u) = 1 and Var(w) = 1. That is, 


1-Var(u)-a'Vqa, 1=Var(w) = B’ VB. 


5.3 Some topics from statistics —— 365 


The correlation between u and w is 


Cov(u, w) 


————— — —4 = Cov(u, w) = a' Vf 
[Var(u) Var(w)] 2 


since Var(u) = 1 and Var(w) = 1. Then the principle of maximizing correlation reduces 


to maximizing a’ V$ subject to the conditions a’V,,a = 1 and f' V£ = 1. Using La- 
grangian multipliers — lA and -4 H the function to be maximized is given by 


p=a' Vp- Aa! Via - 1) I: HB V» - 1). 


The partial derivatives equated to null give, 


Op _ Op _ 
da 9; ƏB sd 

Vif - AV,,a = 0, (a) 
-Vb + V4a = O. (b) 


Since a' Va =1= f'V5,B we have A = u = a’ Vf. Then (a) and (b) reduce to 


EHE 

Vy -AVy | |B 

For a non-null solution we must have the coefficient matrix singular or the determi- 
nant zero. That is 


=0. (c) 


The determinant on the left is a polynomial of degree p = p, * py. Let >A, 2 -- 2 Ay +p, 
be the roots of the determinantal equation (c). Since A = a’ V£ = correlation between 
u and w, the maximum correlation is available for the largest root A,. With this A, solve 
(a) and (b) for a and £. Let the solution be aj), By). Normalize by using at Via =1 
and Buy V», = 1 to obtain the corresponding normalized vectors, denoted by a” and 
B®. Then the first pair of canonical variables is 


1)! D' 
(Uy, w) = (a ) Xo. : Xo). 


Note that a and f? are not only the solutions of (a) and (b) but also satisfy the 
conditions a’ V,,a = 1 and B''V,,B = 1. Thus f?" Xo is the best predictor, in 
the sense of maximum correlation, for predicting the linear function a Xa and vice 
versa. When looking for the second pair of canonical variables we can impose the ad- 
ditional conditions that the second pair (u, w) should be such that u is non-correlated 
with u, and w, and w is non-correlated with u, and w, and at the same time u and 
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w have the maximum correlation between them, subject to the normalization condi- 
tions u' Vu = 1 and w'V5,w = 1. We can continue requiring the new pair to be non- 
correlated with each of the earlier pairs as well as with maximum correlation and 
normalized. Using more Lagrangian multipliers the function to be maximized at the 
(r+ 1)-st stage is, 


1 1 
$,,, = a! Vp - je Via -1)- 3KP V» - 1) 


r r 
zE d Va ET d VB. 
l= l= 


Then 
0$, 00,4 
- 0, =0 
da g? 
r 
Vif - AV 4+ Y v; Vaf? =O, (d) 
i-i 
5 
Va - UVB + 2: 0;V5)p? =O. (e) 


iz 
Premultiplying (d) by a’ and (e) by 89 we have 
0 = vja V, a? =v; 
and 
0 = 680" Vap” = 9. 


But @'V,,a% -1=0 and BO’ VBO —1— 0. Thus the equations go back to the origi- 
nal equations (a) and (b). Therefore take the second largest root of (c), take solutions 
a? and B® from (a) and (b) which will be such that aO' V;,a? - 1, g2' vg? =1, 
aO" Via = o, a?" VB =0, gO' Vyp? = o, B®! Vya — 0. Then 


1 2)! 
(Up, w2) = (a) Xa, f ) Xo) 


is the second pair of canonical variables and so on. For more properties on canonical 
variables and canonical correlations see books on multivariate statistical analysis. 
When computing the roots of the determinantal equation (c) the following obser- 
vations will be helpful. From (a) and (b) for A = u, we can eliminate one of the vectors 
a or p and write separate equations, one in a alone and one in f alone. For example, 
multiply (b), with u =A, by A and (a) by Vj! V», when |V,,| + 0 and add the two to obtain 


(Va Vi Vi; - 4V5)B =O = 
[V4 Vgl Vp - 2V5| 20 > 
[Vip Va Vi Vin = A20 (f) 
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and similarly 
(V5 V5 V3 - 4V4)a 2 0 > 
| Vul V Vj V3 — ui - 0. (g) 


From (f) and (g) note that A? can be taken as the eigenvalues of V5 V4, Vi Vp or of 
Vi Vp V3] V4, or as the roots of the determinantal equations (f) and (g), and a and 
P the corresponding eigenvectors. Thus the problem again reduces to an eigenvalue 
problem. 


Example 5.3.3. (1) Show that the following matrix V can be the covariance matrix 
of the real random vector X' = (x,,35,x5,x,). (2) Let X/ = 6,35) and X5 = (x3,X4). 
Suppose we wish to predict linear functions of x, and x, by using linear functions of 
x3 and x,. Construct the best predictors, best in the sense of maximizing correlations. 


1 1 1 1 
1 2 1 0 
V= 
1 -1 6 4 
1 0 4 4 


Solution 5.3.3. Let us consider the leading minors of V. Note that 


1>0, ; T esso: 
1 2 
1 1 1 
1 2 -1|-1 |V|>0. 
1-1 6 


We are looking for the pairs of canonical variables. Consider the equation 


[V5 VilVi-vV5]20, v=dA’, > 
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8v* -8v+1=0 > v= 
Let v, = 14 Y? and v; = - 2. 


1 
=+ = vV2. 
2 
I For v, consider the equation 


[V4 Vil Vi5 -v Va |B 2 0 = 
G 3-639 Jile- 


pr: 1- v2 


1:9. 42 | A 3 fol ^ 
«en 


is one solution. Let us normalize through the relation 1 = bi V>5b,. That is 


4 4 
Then 


i i 4(2+ V2)? =d 
go. Pi 1 kal -5 | 
vd 2(2+ v2) 1 AG ` 

Now consider 


e 2+ V2 


a 
242 22r 1 ol- 


0 
is one solution. Consider 


aj Vna; = [- V2.1] l ; l iS 
Then 


=. 
i—i 


-4-2N2. 
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The first pair of canonical variables is then 


1 
(uw), uj =a Xo = 


m —— (A28 ENS): 
V - 292) 


1 
204 VJ 


I 
w= po Xo = 
w, is the best predictor of u, and vice versa, best in the sense of maximum correlation 


-5X3 + 
28 


Now let us look for me second pair of canonical variables. This is available from the 
second root v, = Lo - ¥2. Consider the equation 


(Va Vi! Vn - vjV5)B = 0 = 
5 3 1 v2\[6 4]][p 
E 1-G-%)| 


2 4/14 4l}IB} (0 
pE & i ? 
1 
is one solution. Let us normalize. Consider 
6 4|][v2-2 
b5V5b, = [V2- 2,1] i ‘| | i | 
-po-varl = 


gà. ll (**-( a ) 
20-N2 1 30:35) 
Let us look for a. Consider the equation 
(Vi V5 Va - vjV)a - 0 = 
1 ifi -i]l a] ya vnn 0 
IE JL I E es o|? 
-1+ V2 -2+ v2 0) _, 
-2+ V2 0 


24202 
v2 
1 
is one such vector. Let us normalize it. Consider 
1 2 
a} Vna = (V2,1) v2 -442V2. 
1 2 1 
Then 

es 
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The second pair of canonical variables is then 


1 t 
(u5, w;) = (a9 Xa B?" Xo), 
1 


\(4 42N2) 


It is easy to verify that Cov(u,, u2) = 0, Cov(u,, w5) = 0, Cov(w,, w5) = 0, Cov(w,, u5) = 0. 
Here w; is the best predictor of u, and w, is the best predictor of u,, (w4, w;) is the best 
predictor of (u,,u,) and vice versa. 


1 1 
(V2x, +X),  W42 -zXg 


Uy = +——_%,. 
2 REESE 


It is worth noting that the maximum number of such pairs possible is p, if p, < p; 
or p; if p) < p,. 

Factor analysis is another topic which is widely used in psychology, educational 
testing problems and applied statistics. This topic boils down to discussing some 
structural properties of matrices. In a large variety of testing problems of statistical 
hypotheses on the parameters of one or more multivariate Gaussian distributions the 
likelihood ratio test statistics reduce to ratios of determinants which often reduce 
to functions of eigenvalues of certain matrices. Thus the testing problem reduces 
to the determination of the null and non-null distributions of ratios of determi- 
nants or functions of eigenvalues. Generalized analysis of variance problems and 
generalized linear model problems essentially reduce to the study of certain determi- 
nants. 


Exercises 5.3 


5.3.1. Evaluate the principal components in X' = (xj, X2, x3) with the covariance matrix 


5.3.2. Check whether the following V can represent a covariance matrix. If so evaluate 
the principal components in the corresponding vector, X! = (x,,x5, x3). 


1 -1 1 
V=|-1 2 0 
1 0 3 


5.3.3. If the regression of y on x, and x; is of the form 
E(yba,x3) = ag + ax; + à3XX + d3X3 
estimate the regression function based on the following data: 


y 15 Gp. 2 445 
x 0 1 -1 1200 
3s 0 1 1 -1 0 2 -2 
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5.3.4. If the regression of y on x is 
E(y|x) = bg + b,x + by?) + by 
estimate the regression function based on the following data: 


x 01- 2 2 3 
y 15 1 16 -5 40 


and estimate y at x = 1.5 and at x = 2.5. 


5.3.5. Three groups of students are subjected to 3 different methods of teaching. The 
experiment is conducted according to a completely randomized design so that the 


grades obtained by the j-th student under the i-th method, x;j, can be written as 


Xj-Htüjtej i-123, j-L...n; 


ip 
where n; is the size of group i, wis a general effect, a; is the deviation from the general 
effect due to the i-th method and ej is the random part. Evaluate the least square es- 
timates of a,,05, a; and the sum of squares due to the a,’s based on the following data 
where 7404, 05, 43 are constants. Grades obtained by the students are the following: 
Method 1: (grades 80, 85, 90, 70, 75, 60, 70); 

Method 2: (grades 90, 90, 85, 80, 85, 70, 75, 60, 65, 70); 

Method 3: (grades 40, 50, 70, 60, 65, 50, 60, 65). 


5.3.6. Compute the canonical correlation between {x,,x3} and {x>, X4} ifthe covariance 
matrix V of X' = (x,,X>,X3,X,) is given by the following: 


1 -1 1 1 
Ve -1 4 0 -1 

1 0 2 

1 -1 1 3 


5.3.7. Check whether the following matrix can be a covariance matrix of the vector 
random variable X’ = (x1,X2, x3, X4). If so compute the canonical correlations between 
{x,,x,} and x3]. 


3 1 -=I 1 

1 2 0 -1 
Vz 

-1 0 2 1 

1 -1 1 3 


5.4 Probability measures and Markov processes 


In many areas of measure theory, geometrical probability and stochastic processes, 
matrices, determinants and eigenvalues play important roles. Two such typical exam- 
ples will be presented here, one from invariance properties of probability measures, 
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applicable in geometrical probability problems, and another one from discrete time 
Markov processes. 


5.4.1 Invariance of probability measures 


A random plane in an Euclidean k-space R* can be given in Cartesian coordinates 
Xj, ...,Xj as follows: 


UX tee + UX, 1-0. (5.4.1) 


This plane does not pass through the origin. When the coefficients u,,...,u, are real 
random variables we call (5.4.1) a random plane. We can write the plane in vector no- 
tation as 


U'X+1=0, U'-(u,..,uy) X'-Qq.... Xj). (5.4.2) 


Let us consider a rotation of the axes of coordinates. A rotation of the axes can, in 
general, be represented by an orthonormal matrix Q, QQ’ = 1, Q'Q = I where I is the 
identity matrix. Let A’ = (a,,...,a,) be a translation of U. Let the new point X be de- 
noted by X,. Then 


X,=Q(X-Q'A) 2 Q'Xy«Q'A-X, Q!-Q' 
and then 
U'X+1=0 2 U'[Q'X, « Q'A] 41-20 


U'Q'X, 


Lo 5.4.3 
= U'Q'A«1 F ( ) 


where U' Q'A +1 + 0 almost surely (a.s). If the plane in (5.4.3) is denoted by U/X,+1=0 
then 


U! U' Q' 


—— 544 
1 U'Q'A«1 ( ) 


An event B on this plane is a function of the parameter vector U. Let a measure on B, 
denoted by m(B), be given by the integral 


m(B) = J. f(U)dU, dU = du, A+- Adug. 


Under a translation and rotation let the resulting f (U) and B be denoted by f,(U,) and 
B, respectively. Let the corresponding measure be m; (B4). Then 


m,(Bj) = |. f. (U, dU,. 


5.4 Probability measures and Markov processes —- 373 


Invariance property of the measure under Euclidean motion implies that m(B) and 
m, (Bj) are the same. What should be the condition so that m(B) = m,(B,)? Let us ex- 
amine this a little further. 
i U' Q' = y! 
L'U'QA«1 V'A«1 
V! 2U'Q' = (v, ..., v4). 


Then the first element in this row vector is 
to v4, ..., vy yield 


TH Ani and its partial derivatives with respect 


(1+ V'A codi; Via CE V'A)? 


Then the Jacobian is the following determinant: 


(1+ V'Ay €9 
(1+ V'A) -va -V1d5 hs -Vak 
" -V3di (1-V'A)-vja, ... -V3dy 
-va -Vy? we (14+ V'A) -vak 


=(1+ V'A) (v. 


Then 
QU. E 
f(U) =fi(U,)| Sal =f,(U,)(1+ V'A) ot, (5.4.5) 
U'U UU 12 
IU , Qe VIA = | 1 | 
I [14+ VIAP U'U 
Therefore 
fq [ume 
fiU) U'U 
But 
(uU) US 
Thus f (U) is proportional to (uj + -+ iru . Therefore the invariant measure is 
m(B) = | Al 9 (54.6) 
B (ut +++ + ug) = 


where c; is a constant. 
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Example 5.4.1. Compute the invariant measure in Cartesian coordinates, invariant 
under Euclidean motion, the invariant density and the element of the invariant mea- 
sure for a plane in 3-space R?. 


Solution 5.4.1. From (5.4.6) for k 2 3 we have the invariant measure for a set B given 
by 


m(B) = c; | 1 du, ^ du; ^ du; 


B (ud + u$ + u3)? 
where c; is a constant. The invariant density is then 
2:132: 2 2:242 
f (Uy, u5,u3) = cy(uj + u5 + u5) 


where c; is a normalizing constant so that the total volume under f is unity. Therefore 
the element of the invariant measure, denoted by dm, is given by 


dm = c3(u? + u$ + i2) ?du, ^ du; ^ du; 
where (u4, u5,u3) + (0,0, 0) a.s. 


For more on measures, invariance and other topics such as random areas and vol- 
umes in higher dimensional Euclidean space where matrices and determinants play 
dominant roles see the book [4]. 


5.4.2 Discrete time Markov processes and transition probabilities 


In Example 2.2.6 of Section 2.2in Chapter 2 we have considered a transition probability 
matrix which is singly stochastic in the sense that all elements are non-negative and 
further, either the sum of the elements in each row is 1 or the sum of the elements in 
each column is 1. If this property holds for both rows and columns then the matrix is 
called a doubly stochastic matrix. For example 


1 1 
LI. 
2 2 
is doubly stochastic whereas 


08 02 03 04 
Bes | and a| | 


are singly stochastic. 
Consider a problem of the following type: Consider a fishing spot in a river such as 
a pool area in the river. Some fish move into the pool area from outside and some fish 
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from the pool move out every day. Suppose that the following is the process every day. 
70% of fish who are outside stay outside and 30% move into the pool. 50% of the fish 
in the pool stay there and 5096 move out of the pool. If stage 1 is *outside" and stage 
2 is "inside" then we have the following transition proportion matrix, if the columns 
represent the transitions: 


outside-1 inside=2 
0.7 0.5 
outside=1 ^07 05 » A-(-lo5 o5) 


inside = 2 0.3 0.5 


a, = transition proportion from the j-th stage to the i-th stage. [If a, represents the 
transition proportion from the i-th stage to the j-th stage then we have the transpose 
of the above matrix A.] For convenience of other interpretations later on we take a, 
to denote the transition from the j-th stage to the i-th stage so that the sum of the 
elements in each column is 1 in the above singly stochastic matrix. If this process is 
repeated every day then at the end of the first day A is the situation, by the end of the 
second day A’ is the situation (see the details of the argument in Example 2.2.6) and at 
the end of the k-th day the situation is A*. What is A* in this case? In order to compute 
A* let us compute the eigenvalues and the matrix of eigenvectors. Consider |A - AI| = 0. 
That is, 


0.7-A 0.5 
r0.3 O5-A 


E > A,=1, A202 


If we add all the rows to the first row then the first row becomes 1 - 4,1—- A. Then 1- A 
can be taken out. Then A, = 1is an eigenvalue for any singly stochastic matrix. 


(i) One eigenvalue of any singly stochastic matrix is 1. 


Computing the eigenvectors for our matrix A we have for A, = 1, 


07-1 05 ][x] [o 
(A-ADX-0 = | 0.3 zu IH 


1 
= Ao. 


is one eigenvector. For A, = 0.2, 


(A-AJ)X =O = X= K 
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is an eigenvector. Let 


Then 


Therefore 
Ak = (QAQ )--- (QAQ!) = QA*Q"! 
fı alf oļafı 1 
4106 -1||O (02*|16[|06 -1]° 
If k > co then (0.2)* 5 0. Then 
ax. depre x d. A 
0.6 -1}//0 0/16/06 -1 
aifi 1 
~ 1610.6 O6[ 


A” = lim, 5, A“ represents the steady state. Suppose there were 10 000 fish outside 
the pool and 500 inside the pool to start with. Then at the end of the first day the 
numbers will be the following: 


07 05][10000] [7250 
0.3 05|| 500 | |3250| 


That is, 7 250 fish outside the pool and 3 250 inside the pool. At the end of the second 
day the numbers are 


03 0.5 500 0.36 0.40 500 
| [6700 
~ [3800] 
That is, 6 700 fish outside the pool and 3800 inside the pool. Evidently, in the long 
run the numbers will be 


ac [19000] 1[1  1][10000] _ 6562.5 
500 | 16/06 0.6|| 500 | |39375| 


[s 03] [so] [oss oao] so | 
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Even though 0.5 fish does not make sense this vector is the eventual limiting vector 
with 6 562.5 outside and 3 937.5 inside. 

Let us look at the general situation. Let P = (pj) be the transition probability ma- 
trix. 


Du Pp -—Pm 
Pz|: : m : 
Pm Pn > Pm 
with the sum of the elements in each column 1, that is, Y7 , pij -1foreachj - 1,...,n. 


Then, as we have already seen, one eigenvalue of P is 1. What about the other eigen- 
values of P? Note that the sum of the eigenvalues of P is the trace of P. That is, 


tr(P) 2 py t +Pm =+ + An 


where A,,...,A, are the eigenvalues of P with A, = 1. But P^, P?, ... being transition prob- 
ability matrices, all have the same property that the sum of the elements in each col- 
umnis 1. Hence tr(P*) cannot exceed n because pe, ... p% are all probabilities, where 
p\,...,p are the diagonal elements in P*. Note that p% + pk. But the eigenvalues 


of PX are Ak =1, AE, ..., AE. Then 
AE e AE en-1 (5.4.7) 


where n is fixed and k could be arbitrarily large. But (5.4.7) can hold only if Ij] <1 for 
allj=1,...,n. 


(ii) The eigenvalues of a singly stochastic matrix are all less than or equal to 1 in 
absolute value. 


Example 5.4.2. Suppose that a flu virus is going around in a big school system. The 
children there are only of 3 types, healthy (unaffected), infected, seriously ill. The 
probability that a healthy child remains healthy at the end of the day is 0.2, becomes 
infected is 0.5, becomes seriously ill is 0.3 and suppose the following is the transition 
probability matrix: 


02 O1 04 
P-|05 05 0.6 
03 04 03 


Suppose that this P remains the same from day to day. Compute the following: (1) The 
transition probability matrix after 10 days; (2) The transition probability matrix even- 
tually; (3) If there are 1000 children in the healthy category, 500 in the infected cate- 
gory and 100 in the seriously ill category at the start of the observation period (zeroth 
day) what will be these numbers eventually? 
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Solution 5.4.2. In order to answer all the above questions we need the eigenvalues 
and eigenvectors of P. Consider |P — AI| = 0. That is, 


02-À 


0.1 0.1 1 1 1 
02| 05 O5-A 06 |-(-4)|05 05-A 0.6 
0.3 0.4 0.3-À 0.3 0.4 0.3-A 
=(1-A)[A? - (0.1)’]. 
The roots are A, = 1, A, = 0.1, À = -0.1. For 4, = 1, 
2.2/7 
(P-A,DX =O = X,=|10.6/7 
1 
For A, = 0.1 and for A; = -0.1 we have 
-2 0 
X, = 1 > X3 = 1 
1 -1 
Let 
22/7 -2 O0 
Q-[106/7 1 1 
1 1 -1 
Then 
1 0 0 
P-QAQ A=/]0 01 0|, 
0 0 -04 
i 14 14 14 
-1 
= — | -17.6 22 2.2 
g 39.6 
-3.6 162 -234 
Now we can answer all the questions. 
p? Z QAL Q1 
1 0 0 
20|0. 94) o |Q", (40.1) «o, 
0 0 (on)? 
22. 22 22 
7 7 7 
~ 1^ |106 106 106|. po 
39. 7 7 7 i 
1 1 1 
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If the initial vector is Xj = (1000,500,100) then the eventual situation is P??X,. That 


1S, 
22 22 22 


14 > EA EA 1000 
fee] = 10.6 10.6 10.6 500 
0 396| 7 7 7 
1 1 100 
1600/9 177.78 


= | 53(1600)/99 | = | 856.57 
35(1600)/99 565.65 


Thus, eventually one can expect 178 children in the healthy category, 856 in the in- 
fected category and 566 in the very ill category according to the transition probability 
matrix P. 


In the two examples above, we have noted that the steady state or the eventual 
state ofthe initial vector Xo, that is PX, is nothing but a scalar multiple of the eigen- 
vector corresponding to the eigenvalue A, = 1, that is X, in our notation. 


P*?X, = (sum of the elements in X,)X;,. 


This, in fact, is a general result if all the elements in P are strictly positive, that is, no 
element in P is zero. Then all other eigenvalues will be strictly less than 1 in absolute 
value also. 


(iii) If all the elements in a singly stochastic matrix P are strictly positive then one 
eigenvalue of P is 1 and all other eigenvalues of P are less than 1 in absolute value. 
Then the steady state is the sum of the elements in the initial vector multiplied by 
Xi, the eigenvector corresponding to the eigenvalue A, = 1. 


Exercises 5.4 


5.4.1. Show that a plane in 3-space R? can be given by the equation 


xsin$ cos0 « ysin$sin0 « zcos$ =p, 


for O < 0 < 27, O < $ < 7, O <p < œo where p is the perpendicular distance of the plane 
from the origin, and 0 and $ are the polar angles. 


5.4.2. Compute the Jacobian in the transformation 
x - psin$ cos0 
y-psin$sin0 
Z=pcos@. 
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5.4.3. Show that the element of the invariant density in R? in polar coordinates is 
given by 


dm = A|sin $|dO ^ d$ ^dp, Àa constant. 
5.4.4. Consider the following general polar coordinate transformation: 


xX, 2 psin6, sin, ---sin®@,_,sinO,_, 
X; =psin 0; sin, ---sin@,_, cos0, , 


X3 =p sin, sin 0, -.- cosO, » 


Xj. =p sin 0; cos 0; 


Xj, 7 p cos6, 


for 0 < 6; <m, j=1,...,k- 2, 0 < 0, <27, 0 <p « oo, (1) Compute the Jacobian; 
(2) Show that the invariant density, invariant under Euclidean motion, is given by 

k-1 

f (0,01)... 0x1) =Ax | [Isin8]7, Ay a constant. 

jel 
5.4.5. Mr. Good’s job requires frequent travels abroad on behalf of his company in Cal- 
ifornia. If he is in California the chance that he will stay in California on the same day 
is only 10% and the chance that he will be outside California is 90%. If he is outside 
California the chance that he will stay outside on the same day is 80% and that he will 
come to California is 20%. This is the daily routine. (1) Compute the transition prob- 
ability matrix, P, for any given day; (2) Compute the transition probability matrix for 
the 10-th day of observation, po; (3) Compute the eventual behavior of the transition 
probability matrix, P^. 


5.4.6. For a terminally ill patient suppose that there are only two possible stages of 
transition, dead or terminally ill for any given day. Suppose that the chance that the 
patient is still ill the next day is 0.5. Compute (1) the transition probability matrix P; 
(2) Compute P?; (3) Compute P*?. 


5.4.7. Forthe following transition probability matrices compute the steady state if the 
initial state vector is Xo: 


0.2 04 O. 1.05 02 
0.4 07 1 ; 3 
P,= , P;-|03 05 02|, P3=|0 O 04 
0.6 03 
05 01 05 0 05 04 


5.4.8. Let P be a transition probability matrix with the sum of the elements in each 
column 1. Let PX = AX. Show that if A + 1 then the sum of the elements in X is zero, that 
is, if X; is an eigenvector corresponding to A, + 1 then J ! X;=0, J '=(1...,0. 
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5.4.9. Compute the steady states for the matrices P}, P5, P, in Exercise 5.4.7 if the initial 
states are the following: 


(i) x - (1). (ii) x-(1) for P,; 


1 0 0 
© Xə=l0]l, (ii) Xọ=]|1]l, Gii) X = |O | for P»; 
(0) 0 1 
20 10 
(i) Xp =| 30], (ii) Xp =] 10 for P3. 
10 10 


5.4.10. Ina particular township there are 3 grocery stores. The number of households 
in that township is fixed, only 200. Initially the three stores have 100, 50,50 house- 
holds respectively as customers. The stores started weekly sales. Depending upon the 
good sales the customers started moving from store to store. Suppose that the chances 
of customer of store 1 to remain or move to store 2 or 3 are respectively 0.7, 0.2, 0.1. Sup- 
pose the weekly transition probability matrix is 


0.7 05 0.6 
A-|02 04 02 
01 01 02 


What will be the numbers in column one of the transition matrix after 3 weeks? When 
can we expect store 3 to close if less than 20 customers is not a viable operation? 


5.4.11. Suppose there are four popular brands of detergents in the market. These 
brands have initially 4096, 3096, 2096, 1096 of the customers respectively. It is found 
that the customers move from brand to brand at the end of every month. Answer the 
following questions if (1) A and (2) B is the transition matrix for every month: 


0.4 05 04 04 
"s 0.2 04 04 05 ? B= 
02 01 01 0.0 


0.2 00 01 04 


POO O° 
oro°o 
oor Oo 
oOo F 


What will be the percentages of customers after (i) two months, (ii) three months, and 
(iii) eventually, for each A and B? 


5.5 Maxima/minima problems 


The basic maxima/minima problems were already discussed in earlier chapters. Here 
we will consider a unified way of treating the problem and then look into some more 
applications. 
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5.5.1 Taylor series 


From elementary calculus the student may be familiar with the Taylor series expan- 
sion for one real variable x. The expansion of a function f(x) at x = a is given by 


- d (x- n 


fœ) =f@ + ——f (a) + —— fO (a) 4 -- 
where f” (a) denotes the r-th derivative of f (x) with respect to x and then evaluated 
at x =a. Let us denote Dy, as the r-th derivative operator evaluated at a given point so 


that Dif will indicate f (a). Then consider the exponential series 


Gc. (x= a p 


(-a)D, — (y — gq) 
e o = (x-a) Do + íi 2 0 


so that eo operating on f(x) gives 


‘i E Mi xs 


f(a) + ——f (a) + — f (a)4 

where (x — a)°D8f = Df = f (a). This is the expansion for f(x) at x = a. For a function 
of two real variables, f(x1,x;), if the Taylor series expansion is needed at the point 
(XX2) = (a4, a5) we can achieve it by considering the operators Dio) = partial deriva- 
tive operator with respect to x; evaluated at the point (a,,a,) for i = 1,2, and the linear 
form 


6 = (x = aı)Di(o) s (X5 = az)Dy0)- 


Consider the exponential series 


Then this operator, operating on f (x1,x5) is given by 


1 rs) a 
f(a,a5) + m [oa - disci ane) + (X)- a) fas.) 
2 
«x[e- a) E ag (22 + 0 a)? Saf (aay) 


2: 


+ 260 - A,)(X 83 - flay,a,)| 
X10X? 


x let -f60,x3), 


where, for example, 2 f (a4, a5) indicates the r-th partial derivative of f with respect to 
J 
x;, evaluated at the point (a4, a2). If we have a function of k real variables, f(x,,...,X;), 
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and if a Taylor series expansion at (a4, ...,a,) is needed then it is available from the 
operator 


e, 6-0, -a)Dqo + ++ + (Xk - | Dio) 


operating on f. Denoting a = (d,,...,a,) we have 


fo. eM Xy) -f(a) 
1 0 re) 
*u [oa -af O+ (cag Fl 


1 o? o? 
*x [oa -ap Saha) ++ Oy ag)? sai) 


2 
+ 260 - a1); 2) f (a) ++ 
10X2 


o? 
+ 2084 - à) 1)08 - Oe 0k van 
1 
T gb TOT 


If a = (a,,...,a,) is a critical point for f(x,,...,x,) then 2 f(a) = 0, j=1,...,k. Then 
J 

in the neighborhood of the point a, that is, x; - a; = Ax;, i = 1, ..., k, where Ax;’s are 

infinitesimally small, we have 


1 1 
fo... x) -f (di... ap) = 9t gl 
1 
ore 


where 
3 0^ 9? 
a= (AX4) ax 4) tot "e OUS m 


But a is the following quadratic form: 
2 2 
Sha. ae fa] fax 
a= [AX,,...,AxX;] ul : : 
2 2 
aa f (0) er sal (a) AX, 
This term decides the sign of f(x,,...,x;,) - f(a,,...,@,) in the neighborhood of a = 
(a,,...,@,). This term will remain positive for all Ax,,...,Ax, if the matrix of this 


quadratic form, namely, 


O?f(a) of (a) 
ə df(a) ox; UC OXj0X, 

s E = : vs : 
oX ox! Of (a) Of (a) 


OX,0X, CU OX; 
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is positive definite. That is, in the neighborhood of a = (a,,...,a,) the function is in- 
creasing or a corresponds to a local minimum. Similarly if Ag is negative definite then 
a corresponds to a local maximum. If Aj is indefinite or semi-definite then we say that 
aisasaddle point. We have already given three definitions for the definiteness of a real 
symmetric or Hermitian matrix. The one in terms of the leading determinants (leading 
minors) of A, is the most convenient one to apply in this case. Thus at the point a we 
havea 


local minimum if all the leading minors of Ag are positive, 


local maximum if the leading minors of A, are alternately negative, positive, negative, 


Example 5.5.1. Check for maxima/minima of f (x, y) = 2x? + y? + 2xy - 3x - 4y + 5. 


Solution 5.5.1. Consider the equations JY -0and of =0. 
of o => 4x+2y-3=0 
Ox 
F o > 2x+2y-4=0 
oy 
Byes. yl? 
um aaa 


There is only one critical point (x, y) = (- 1, 3) . Now, consider the matrix of second order 
derivatives, evaluated at this point. 


af of 

A, = ox Oxoy | _ 4 2 

o J f| f2 [T 
oyox 9? 


In this case since the second order derivatives are free of the variables there is no need 
to evaluate at the critical point. The leading minors of this matrix A, are the following: 


4 2 


40, 
2 2 


J=8-4>0. 


Hence the point (- 1, 3) corresponds to a minimum. 


In this example f (x,y) is of the form of a quadratic form 2x? + y? + 2xy plus a linear 
form -3x - 4y plus a constant 5. When the matrix of the quadratic form is positive 
definite then we can devise a general procedure without using calculus. Consider a 
quadratic expression of the type 


u-X'AX -2b'X «c (a) 
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where A = A’ > 0, b, c are known and X’ = (%,...,X,), b! = (bj,...,b,). Let us minimize 
u without using calculus. In order to illustrate the method let us open up a quadratic 
form of the following type where f is an n x 1 vector: 


(X - B' A(X - B) = X' AX — 2B’ AX + B' AB. (b) 
Comparing (b) with (a) we note that for B’ = b' A"! we can write 
u = (X- A Ib) A(X - A b) - b'A b «c. (c) 


But b' A^! b and c are free of X. Hence maxima/minima of u depends upon the max- 
ima/minima of the quadratic form (X — A"!b)A(X - A~'b) which is positive definite. 
Hence the maximum is at +00 and a minimum is achieved when 


X-A'b=0 => X=A"b. 


In Example 5.5.1 this point A! is what we got since in that example 


fha NE - 
a- i »- P7], c=5. 


Then 


ass t SEL. pu]. 
-1 2 2 5/2 
This type of a technique is usually used in a calculus-free course on model building 
and other statistical procedures. 


Example 5.5.2. n measurements x4, ...,x, are made on an unknown quantity a. Es- 
timate a by minimizing the sum of squares of the measurement errors, and without 
using calculus. 


Solution 5.5.2. The measurement errors are x, - a, ..., x, — a. The sum of squares of 
the measurement errors is then Y (x; - a)*. What should be a so that Y" ,(x; — a)? is 
the minimum? If we are using calculus then we differentiate with respect to a, equate 
to zero and solve. Without using calculus we may proceed as follows: 


Xp XQ 


n 
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But for real numbers 
n 
Y -Xy20 and n(x-ay20. 
i=l 


Only the term n(x — a} contains a. Hence the minimum is achieved when n(x - a)? = 
0 = a=x. This, in fact, is a special case of a general result. 


(i) The mean squared deviations is least when the deviations are taken from the 
mean value. 


This same procedure of completing a quadratic form can be used for general model 
building also. Consider the general linear model of the regression type considered in 
Section 2.7.5 of Chapter 2. There the error sum of squares, denoted by e'e, can be writ- 
ten in the form 


e'e- (Y - Xp) (Y - Xp) 


where Y is ann x 1 vector of known observations, X is n x m, n > m, and known, and 
p is m x 1 and unknown. This parameter vector f is to be estimated by minimizing 
the error sum of squares. A calculus-free procedure, using only matrix methods, is the 
following when X is of full rank, that is, the rank of X is mor when X' X is nonsingular: 
Note that 


e'e 2 Y'Y - 2p' X' Y + f' X' Xp. 
Among terms on the right, Y'Y does not contain fl. Hence write 


B'X'XB -2p'x' Y = [B - (x'X) !x'Y]|' (X' X)[B - (X' X) !x'Y] 
- Y'X(X'XY“X'Y. 
The only term on the right containing f is the quadratic form where the matrix of 
the quadratic form, namely X' X, is positive definite. Hence the minimum is achieved 
when this part is zero or when 


B-(X'X)?x'Y-0 = p- (X) IXY, 


This is the least square solution obtained through calculus in Section 2.7.5 and the 
least square minimum, denoted by s?, is then 


s? = Y'Y -Y'X(X'X) !x'Y - Y'[I- X(X'X) !X']Y. 
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5.5.2 Optimization of quadratic forms 


Consider a real quadratic form q = X' AX, A = A' where X is n x 1 and A is n x n. We 
have seen that there exists an orthonormal matrix P such that A = PAP’ where A = 
diag(A,,...,A,) with A,,...,A, being the eigenvalues of A. Then 


q-X'AX =X'PAP'X=Y'AY, Y=P'X 
=y? tet Aye, Y'= Yi -Yn (5.5.1) 
If A; 20, j=1,...,n then q = Ay? + ++ + A2, q > 0, represents a hyperellipsoid in 


r-space where r is the number of nonzero A/'s. If Aj 21, j2 L...,r, A41 202 =A, 
then q is a hypersphere in r-space with radius yq. 


Definition 5.5.1 (Hyperellipsoid). The equation X'AX < c whenc > 0, A = A’ > O(pos- 
itive definite) represents all points inside and on a hyperellipsoid. 


Its standard form is available by rotating the axes of coordinates or by an orthog- 
onal transformation Y = P'X, PP! = I, P'P- I where 


A - PAP! = diag(A,, ...,À,) 


where A,, ..., A, are the eigenvalues of A. When A > Oall A's are positive. The standard 
form of a hyperellipsoid is 


c=X'AX - Ay? t+ + Any? 


where 4/c/A,, ..., c/À, are the semi-axes. 

If there are no conditions or restrictions on X and A in q = X' AX then from (5.5.1) 
we can see that q can go to +00 or —oo or to an indeterminate form depending upon 
the eigenvalues A, ... ‚An. Hence the unrestricted maxima/minima problem is not that 
meaningful. Let us restrict X to a hypersphere of radius unity, that is X' X — 1, or, 


X +X + tx? a1, X= (XXn). 


Let us try to optimize q = X' AX subject to the condition X'X = 1. Let A be a Lagrangian 
multiplier and consider the function 


qı -X' AX - A(X' X - 1). 
Since X'X - 1= 0 we have not made any change in q, q, = q, where À is an arbitrary 
scalar. Using the vector derivative operator 2 (see Chapter 1) 


ð ð 
—q = X'AX - A(X'X -1)] = 
70 sax Ax 1-0 


= 2AX -2AX =O 
=> AX-AX (5.5.2) 
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(see the results on the operator 2 operating on linear and quadratic forms from Chap- 
ter 1). From (5.5.2) itis clear that A is an eigenvalue of A and X the corresponding eigen- 
vector. Premultiply (5.5.2) by X’ to obtain 


X'AX =AX'X=A_ when X'X = 1. 


Therefore the maximum value of X' AX is the largest eigenvalue of A and the minimum 
value of X' AX is the smallest eigenvalue of A. We can state these as follows: 


(ii) max [X' AX] =A, = largest eigenvalue of A. 
(iii) min [X'AX] =A, = smallest eigenvalue of A. 


If we did not restrict A to a hypersphere of radius 1 let us see what happens if we simply 
say that X' X < oo (the length is finite). Let X'X = c for some c < oo. Then proceeding 
as before we end up with (5.5.2). Then premultiplying both sides by X' we have 
EAE d 
X'X 
Therefore we have the following results: 


1 


(iv) max| MT =A,, = largest eigenvalue of A if X'X « co. 
L 
(v) min[ 7 | = À = smallest eigenvalue of A if X'X < co. 
! PN 2 
(vi) ee dS DO VCI roa 


XX yh 
where Y = P'X, PP’ =I, P'P - I, Y! - (y, ..., y). 
For convenience, A, is taken as the smallest and A, the largest eigenvalue of A. The re- 


g ee x . hA tel X'AX h), n 
sults in (iii) and (iv) are known as Rayleigh's principle and ry the Rayleigh's quotient 
in physics. 


Example 5.5.3. Evaluate the Rayleigh's quotient for the quadratic form X'AX, 
A= (a) if (a) X' = (,0,...,0), (b) X' = (0,0,...,1,0,..., 0), j-th element is 1. 


Solution 5.5.3. (a) When X' = (1,0,...,0), X'AX = axi and X' X = xj. Then 
X'AX 
Gu e 

Similarly for (b) 
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5.5.3 Optimization of a quadratic form with quadratic form constraints 


Let us generalize the above problem a little further. Suppose we wish to optimize X' AX 
subject to the condition X' BX = c « co. Consider 
p - X'AX - A(X' BX - c) 


where A is a Lagrangian multiplier. 


M O = AX - AEX. (5.5.3) 
aX 


This means that for a non-null solution X we must have 


|A-AB|=0 = |B'A-Al|=0_ if |B| #0 or 
lA3B-(/AI| 20 if|A| #0. 


Thus the maximum value of X' AX occurs at the largest root, À,, of the determinantal 
equation |A —AB| = 0 which is also the largest eigenvalue of B^1A if |B| + 0. From (5.5.3), 


X' AX - AX! BX =Ac 


and thus we have the following results: 


(vii) max [X'AX]=A,c, A,,= largest root of |A - AB| = 0. 
'BX=c 

(viii) ymin [X'AX]=A,c, A, = smallest root of |A — AB| = 0. 
'BX-c 


Observe that if A and B are at least positive semi-definite then we are looking at the 
slicing of the hyperellipsoid X' BX = c with an arbitrary hyperellipsoid X'AX = q for 
all q. Note also that the above procedure and the results (vii) and (viii) correspond to 
the canonical correlation analysis when B = B’ > 0 and A =A! > 0. 


Example 5.5.4. Optimize 2x7 + 3x5 + 2x,x> subject to the condition x? + 2x2 +2x,x, = 3. 


Solution 5.5.4. Writing the quadratic forms in matrix notation we have 


2 1l||x 
2x? + 3 + 2x,x5 = [xx 1 
1 5*2X)x; = [x JE : 


= X! AX, aae | xefa]; 


1 1||x 
xl 42x24 2X, = [X1x] l 4 ls 
2 


X'BX-3, B-P' l jl 
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Consider the determinantal equation 
paon Soft Jj 
1 3 1 2 
= (2-48-24) - (1-4? =0 
= M-51«520 


aE s euo 
2 2 
Therefore the maximum value of X' AX is 
Ac = ex? AEN 5 “(5+ v5) 


and the minimum value of X' AX is 


Ac = 26 - v5). 


5.5.4 Optimization of a quadratic form with linear constraints 


Consider the optimization of X' AX subject to the condition X' b = c where bisannx1 
known vector and c is a known scalar. If A = A’ > 0 then q = X' AX with q > 0 a con- 
stant, is a hyperellipsoid and this ellipsoid is cut by the plane X' b = c. We get another 
ellipsoid in a lower dimension. For example if the ellipsoid is 


2324.42, y2 
q = 3X] +X5 + X3 -2X1xX; 


3 -1 0||x 
-[x,X»X]|-1 1 O} | x | =X'AX, 
0 0 14 {x3 
X; 3 -1 0 
X=|x,|, A=]-1 1 0 
X3 0 0 1 


and if the condition is 


Xj tX) X421 2 X'b=1, 


then from the condition we have x; = 1 - x, — x;. Substituting in X' AX we have 
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q = 3xf +x + (0 -x, -x3Y -2xyx; 
-24xp4212—2€ -2X3 +1 
1v? 1\? 1 
= (x -= >) ZCE ;) Mm 
4 2) 4 
It is an ellipse with the center at G, 2) and the semi-axes proportional to 5 and aa For 
example if q — i = 1 then the semi-axes are 5 and A Then the critical point is G, 1) 


and the maximum value is I + E and the minimum value is 5 - T 


But our original aim was to optimize X' AX and then q is a function of x,, X2, say 
q(X,,X>). Then 


1A? 1% 1 
qx) = A - 7.) zc TX 


Then going through the usual maximization process we see that there is a minimum 
at G, 2) for finite X and the minimum value is i 

Substitution may not be always convenient and hence we need a general proce- 
dure to handle such problems. Consider the method of Lagrangian multipliers and 
consider 


$ - X'AX - 2A(X'b - c) 
where 2À is a Lagrangian multiplier. Then 


d 202 2AX -2Àb =0 = AX - Ab. 


If |A|  O then 


X-2AA b = X'AX - AX! b - Àc, 


When A is positive definite the maximum is at co and then for finite X we have only a 
2 

minimum for X' AX and the minimum value is Pa We can verify this result for our 

example above. In our illustrative example 


and then b' A^! b = 4. Since c = 1 we get the minimum value as i which is what is seen 
earlier. 
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(ix) min [X' AX] - when A =A’ » 0. 


M E 
X!b-c b'A-b 


5.5.5 Optimization of bilinear forms with quadratic constraints 


Consider a bilinear form X' AY where X is px1, Y isqx1and Ais px q. For convenience 
let p x q. Consider the positive definite quadratic form constraints in X and Y. Without 
loss of generality the constraints can be written as X'BX 2 1, Y'CY - 14, B= B' 50, C = 
C' » 0. Our aim is to optimize X' AY subject to the conditions X'BX =1 and Y'CY - 1. 
Let 


$= X'AY - 5M (X'BX m SA CY -1) 
where iù and i% are Lagrangian multipliers. Then 


SUE E AY - AJBX - O 
ax 


= AY =A,BX 
= X' AY =A,X'BX = À. 


For taking the partial derivative with respect to Y write X' AY = Y'A'X. 


96 L0 = AIX-A,CY=0 
oY 


= A'X -ACY 

> Y'A'X - A,Y'CY =A. 
That is, A, =A,(=A, say). The maximum or minimum is given by A. Writing the equa- 
tions once again we have 


-ABX+AY=O and A'X-ACY-O 


-[* Ale 


In order for this to have a non-null solution for (€) the coefficient matrix must be 
singular. That is, 


-ÀB A|. ó 

A -Aac ^ 
Evaluating the left side with the help of partitioned matrices (see Section 2.7 in Chap- 
ter 2) we have 
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| - AB|| -AC - A'(-AB) A| = 
-ÀC + SA'B AA =0 > 
|A'B!A -2?°C| = 0. (a) 


There v = A? is a root of the determinantal equation (a) above. Thus we have the fol- 
lowing results: 


(x) ax  [|X’AY|] = yl, 
X BXL Y CY-1 
B-B'50, C-C'50 


where Aj is the largest root of the equation (a). 


(xi) IZ 
B= B' »0, C= ou 


where A is the smallest root of the equation (a). 


Note that A? can also be written as an eigenvalue problem. Equations can be written 
as 


|C1A'B1A-A*I|=0 or 
|C-3A'BlAC^i — API] 20 
where C? is the symmetric positive definite square root of C = C’ > 0. Either A? can be 


looked upon as an eigenvalue of C14 B-1A or of C? A'B-1AC^?. From symmetry it 
follows that A? is also an eigenvalue of B-1AC-14' or of B2 AC-1A' B^. 


xii max X'AY A 
can X'BX=1, Y'CY= di |] = Ml 
B=B'>0, C=C'>0 


where Aj is the largest eigenvalue of C"'A'B™'A or that of BACA’ or of 
C3 A! BLAC? or of B2 ACA B^? or the root of the determinantal equations 


|A’B-1A - 2C| =0 
or of 
|AC-1A' - 2B| =0 


xiii) min — [IX'AYI] = ll 

X'BX=1, Y'CY-1 

B-B'»0, C= (^50 
where A? is the smallest eigenvalue of the matrices or the smallest root of the deter- 
minantal equations as in (xii) above. 
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An important application of the results in (xii) and (xiii) is already discussed in con- 
nection with the canonical correlation analysis and a particular case of which is the 
multiple correlation analysis. The structure in partial correlation analysis in statistics 
is also more or less the same. 


Example 5.5.5. Look for maxima/minima of 
Xy, + 2X1Y2 = X1Y3 + Xoy1 + Xoy2 + Xoys 
subject to the conditions 
X? + 2x} +2xyxX,=1 and 2y? +y5 +y +2yıy;=1. 


Solution 5.5.5. Writing these in matrix notations, the quantity to be optimized is 


E yı 
X' AY, a=; = il x= [5] Y- |y, 


1 1 1 X 
2 y; 
and the conditions are 
1 1 
X'BX=1, B= and 
1 2 
2 0 1 1 0 -1 
Y'cY-1 C-|o 1 O|, C'sj|o 1 o 
1 0 1 -1 0 2 


Since the order of B is 2 or B is 2 x 2 whereas C is 3 x 3 we consider the determinantal 
equation 


|AC“1A’ - AB = 0, 


12 -1 1 

ACA! = 0 1 0 2 ius ; 

11 1 
-1 0 2]//-1 1 


IACTA’ -7B=0 = 


[dep ges e 
1 2 1 2 


9-v 1-v 
1-v 2-2v 


=> (1-v)(17-v)=0. 


Therefore v, = 17 and v, = 1 are the roots. The largest A is V17 and the smallest positive 
value of A = 1. Hence the maximum value of X' AY is V17 and minimum absolute value 
of X! AY is 1. 
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In other problems in statistics especially in conditional distributions and when 
searching for minimum variance estimators in a conditional space the problem 
is something like to maximizing a bilinear form of the type X' AY, where one of 
the vectors, either X or Y, is fixed (given by the conditionality assumption), sub- 
ject to a quadratic form condition involving the other vector, usually something 
like the variance of a linear function in that vector is 1 which reduces to the form 
Y'CY 21 C-C'»50 if Y is the variable vector. For achieving such a maximization 
we can use a different approach based on Cauchy-Schwartz inequality also. Suppose 
we wish to maximize X' AY subject to the condition Y'CY - 1, C=C’ > 0 and X fixed. 
Then 


X' AY = (X' AC 3)(C3Y) < VX'ACTA'X VY'CY 
= VX'AC-1A'X 


since Y'CY = 1, by Cauchy-Schwartz inequality, where C 3 is the symmetric positive 
definite square root of C. Thus we have the following result: 


max [X'AY] = vX'AC1A'X. 
Y' CY=1,C=C'>0,X fixed 


(xv) max [X' AY] = VY'A' B-14Y. 


X'BX=1,B=B' »0,Y fixed 


(xiv) 


Example 5.5.6 (Best linear predictors). A farmer suspects that the yield of corn, y, is 
a linear function of the amount ofa certain fertilizer used, x;, and the amount of water 
supplied x,. What is the best linear function of x, and x;, u = a,x, + aX), to predict y, 
best in the sense that the variance of u is 1 and the covariance of u with y is maximum. 
Evaluate this best linear predictor as well as the maximum covariance if the follow- 
ing items are available from past experience. Var(x,) = 2, Var(x;) = 1, Cov(4,x5) = 1, 
Cov(y, x4) = 1, Cov(y,x;) = -1, Var(y) = 3 where Var(-) and Cov(-) denote the variance 
of (-) and covariance of (-) respectively. 


a x 
a) x 
where a is an arbitrary coefficient vector for an arbitrary linear function u = a'X = 


aX; + A>X>. Since we are dealing with variances and covariances, assume without loss 
of generality, that the mean values are zeros. Then 


Var(x,)  Cov(x,,X>) | | a, 
Cov(,x;)  Var(x>) ay 


2 1 
- a! Ba, B=] | 


Solution 5.5.6. Let 


Var(u) = [a4,a;] | 


1 1 
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Cov(y,u) = Cov(y, a' X) =a’ Cov(y, X) 


a | Cov x) 4,11]. . 
e ee [a ME 4» 


Our problem is to maximize Cov(y, u) subject to the condition Var(u) - 1. For conve- 


nience let 
1 2 1 
V. = " V. = > 
a=| 4) veli 
then 
Var(u) = a'V,a=1 
and 


NER 
Cov(y,u) = a! Vj = (a V) (Vz/ Va) 
ES ya Vna A Vig Va 


=y V4 V3 Va 


since a' Va = 1. Hence the maximum covariance is 


— E 1 
VVAV Va = [in-i E j Bi = v5. 


1 1 
From Cauchy-Schwartz inequality the maximum is attained when (a' V$) and (V3, Vz) 
are linear functions of each other. That is, 


a’ Vj =k, V V3 +k, 
where k, is a scalar and k, is a vector and the best predictor is 
a' X = Vj? [k V5? V4 + k)]X. 
With the help of the conditions that E(X) = O, Var(a' X) = a'V5,a = 1and the maximum 


covariance is Cov(y, u) = Y V4 V3] V4 we have k, = 1and k, = O. Thus the best predictor 
is given by 


u-a'X-Vj4V4X 


and the maximum covariance is given by yV} V3} Vz. 


Exercises 5.5 
5.5.1. Look for maxima/minima of the following functions: 


foc y) = 3x? + 2y? + 2xy - 4x - 5y +10, 
g(x,y) = 2x2 +y? -2xy - x -y +5. 
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5.5.2. Repeat Exercise 5.5.1 without using calculus. 
5.5.3. Look for maxima/minima of the following functions: 


(a)  3xvxi-2áó-2x)5 
subject to x? 4 x2 * x3 =1, 
(b 3x? +x3 + 23 - 2x1% 
subject to 2x? + x} + 2x2 =1. 
5.5.4. Let x,,...,x, ben real numbers and a an unknown quantity. What is a so that 
Yt lx; -alis a minimum? [This principle can be stated as “the mean absolute devia- 
tions is least when the deviations are taken from (?)".] 


5.5.5. Let U,,..., Uy and V be n x 1 vectors of real numbers. Let there exist an X such 
that Uj X z 0 forj =1,...,k. Then show that the necessary and sufficient condition that 
V'X > 0 for all X such that Uj X > 0 is that V can be written as V = 2 a;U;, a; 2 0. 
[This is known as Farka's lemma.] 


5.5.6. Gramian matrix. Any matrix A which can be written as A = B'B is called a 
Gramian matrix. For a Gramian matrix A and n x 1 real vectors X and Y show that 


(X' AY) < (X' AX)(Y' AY) 
with equality when X is proportional to Y, and 

(X'Y)? « (X' AX)(Y'41Y) 
if A! exists, with equality when X is proportional to A ! Y. 
5.5.7. Let X! 2 (, ...,x,) and Y' = (y,,..., Yn) then show that 

n n n 2 
(x«i) - (Y) = 2 yj =y 
jal jal ji ij 

[This is known as Lagrange identity.] 


5.5.8. For a nonsingular n x n matrix A = (aj) show that 


n 


. 27 *. P. 
|A| x 1, eee i=1,...,n 
J= 


and that 


[A| < 6"n"? if |a,| < 6 for all i and j. 
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5.5.9. Let A = (aj) be ann x n real symmetric positive definite matrix. Show that 
|A| < as; (cof(a,,)) 

where (cof(a44)) denotes the cofactor of a,,, and 
|A| < ayan: Ann- 


5.5.10. Let A be an n x n real symmetric positive definite matrix and U and X be 
n-vectors. Show that 


(U'Xy 


-U'AdU. 
X'AX 


max 
X 
5.5.11. Let A beas in Exercise 5.5.10 and B any n x n matrix. Then show that 


(U' BX}? 


LC -U'BAJB'U. 
X  X'AX 


5.5.12. Let X4, ..., X, be mutually orthogonal n x 1 vectors and A = A’ a real symmetric 
matrix. Then show that 


where A, > A, 2 -+ >A, are the eigenvalues of A. 


5.6 Linear programming and nonlinear least squares 


Inthe previous sections we dealt with optimizations of nonlinear functions with linear 
or nonlinear constraints. Suppose we wish to optimize (maximize/minimize) a linear 
function with linear constraints. Obviously methods based on calculus fail here. Ge- 
ometrically a linear function represents a line in a 2-dimensional space (plane) or a 
plane or hyperplane in n-dimensional Euclidean space, n > 3. A line or plane stretch 
from —oo to co and hence there is no optimum value if the whole line or plane is taken 
into consideration. But if we confine to a convex region in a plane or space in our 
search for an optimum then there is always a maximum and a minimum for an ar- 
bitrary line or plane passing through that region. For example, let us try to look for 
maxima/minima of the linear function x + y in the region bounded by the following 
constraints: x 0, y 2 0, x -3y < -3, 3x +y <6. 

The conditions x > 0, y > 0 imply non-negative values or we are in the first quad- 
rant. x — 3y < -3 means below the line x - 3y = -3 and 3x + y x 6 means below the line 
3x + y = 6. Thus we are looking for maxima/minima of x + y in the shaded region in 
Figure 5.6.1. Consider the equation x + y = c for various values of c or move the line 
X +y = 0 parallel to itself. A few positions are shown in Figure 5.6.1. It is obvious that 
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Figure 5.6.1: Optimization of a linear function. 


the maximum or minimum value of c is obtained at one of the corner points of our con- 
vex region, the shaded region in Figure 5.6.1. This, in fact, is a general property when 
the region is convex. The corner points are (0,0), (0,1), (1.5,1.5), (2,0). The values of 
x+y at these points are 0 + 0 = 0, 0 +1 =1, 1.5 + 1.5 = 3, 2 + O = 2 respectively. Hence 
the minimum value of x + y is zero and the maximum value is 3. 

In the above problem suppose we had one more condition that 3x + y < 2. Note 
that our region remains the same or in other words this new condition is superfluous 
in our problem. 


Example 5.6.1. A lunch counter in an office building plans to prepare two types of 
sandwiches for a particular day. Let x, and x, be the numbers. Then x, > 0 and x, > 0. 
The profit from the first type is $2 per sandwich and that from the second type is $3 
per sandwich. It costs $1 and $2 each respectively to make these sandwiches and the 
operator does not want to allocate more than $100 for these two types of sandwiches 
for that day. That is, x, +2x < 100. It takes 2 minutes each to prepare these sandwiches 
and the operator does not want to spend more than 2 hours in preparing them. That 
is, 2x, + 2x; < 120. What should be the numbers x, and x, so as to maximize the profit 
assuming that all the sandwiches will be sold. 


Solution 5.6.1. The region where we want to maximize 2x, + 3x, is the shaded region 
in Figure 5.6.2. 
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100 


X, + 2x2 = 100 


2X, 2x, = 120 


Figure 5.6.2: Linear function with linear constraints. 


The values of 2x, + 3x, at the corner points (0,0), (60,0), (20,40), (0,50) are re- 
spectively 0, 120, 160, 150, and thus the maximum value is 160 occurring at (20, 40) and 
hence the operator should make 20 of type 1 and 40 of type 2 for that day. 


From the above examples it is clear that if only two variables are involved then the 
optimization problem can be solved graphically. But if we are in an n-space, n > 3 or if 
more than 3 variables are involved then a graphical solution is not feasible. Even for 
n = 3 it is quite difficult to see graphically. Hence we need other methods. One such 
method, based on matrix considerations, is called a simplex method. This optimiza- 
tion problem involving linear functions with linear constraints is called a linear pro- 
gramming problem, nothing to do with any special computer programme or computer 
language. There are many ways of explaining the simplex method. First, observe that 
an inequality of the type x - y > 2 is the same as saying -x + y x -2. The inequality is 
reversed by multiplying with (—1) on both sides. Hence in some of the constraints if the 
inequality is the other way around it can be brought to the pattern of the remaining 
inequalities by the above procedure. 


5.6.1 The simplex method 
Let us consider a problem of the following type: Maximize 

fHCXy te t+ c,x,2CX-2X'C, Cl =a(Cy,...,C,), X -08....x) 
subject to the conditions 


Ay Xq + 5X; + t AinXn ED, 


Q31X + 053X3 + +++ + 4X € by (5.6.1) 
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Gg X1 + Am2X2 rb AmnXn < bm 
X20, x,20,..,x,20 
where Cj'S, bj's, aj's are all known constants. For convenience, we may use the fol- 


lowing standard notation. The inequalities in (5.6.1) will be written in matrix notation 
as 


AX«b, X20, b'-(by..bg) A= (ay). (5.6.2) 


Note that earlier we used a notation of the type G > 0 or G > 0 to denote positive def- 
initeness and positive semi-definiteness of a matrix G. In (5.6.2) we are not talking 
about definiteness but only a convenient way of writing all the inequalities in (5.6.1) 
together. Thus the problem can be stated as follows: 


Maximize f -C'X subjectto AX «xb, X20. (5.6.3) 


Note that an inequality can be made to an equality by adding or subtracting a certain 
amount to the inequality. For example, 


aX to + Ay Xy < bi > aye + rüduxQ +y =b y 20. 


Thus by adding a positive quantity y, to the inequality an equality is obtained. Note 
that y; isan unknown quantity. Thus by adding y,, y;, ...,¥;, the quantities may all be 
different, to the inequalities in (5.6.1) we get m linear equations: 


Ay Xy + + AX + V1 = bi 


05X4 + 0X) + +++ + X, + V2 = b; 


Anni X1 + OX) + °° + AmnXn + Ym = Dg. 


That is, 
x 
X2 
ay a, 1 0 0 : b 
dj am O 1 DRE Eo NM 
: i : : b. 
amı Amn 0 0 1 n ie 
Ym 


This can be written in partitioned form as follows, augmenting the equation f = C' X: 


yı 
A In||X|_ |b m ne 
2 SIGE) El o ro osm 
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Note that 
f =C'X =01X1 ++ HOAX py = CX UE CX Oy, +- Oy. 


Now, it is a matter of solving the system of linear equations in (5.6.4) for X, Y and f. We 
have considered many methods of solving a system of linear equations. Let us try to 
solve the system in (5.6.4) by elementary operations on the left, that is, operating on 
the rows only. For this purpose we write only the coefficient matrix and the right side 
separated by dotted lines. We may also separate the sub-matrices in the coefficient 
matrix by dotted lines. Write 


(5.6.5) 


For the time being, let m x n. In doing elementary operations on the rows our aim will 
be to reduce the above matrix format to the following form: 


A : I, o: b Bln: G : b, 
"E E ick cuu uua c (5.6.6) 
C: 0 : f -CQ,0 : -C) : f-d 


In the original position of A there is a matrix B which may or may not be null and an 
m x m identity matrix or of the form 1, B. Remember that the original A was m x n, 
m x n. In the original position of I„ there is an m x m matrix G. In the original position 
of the row vector C’ there is a row vector -C| augmented with the null vector O or 
the other way around. In the original position of the null vector there is a vector -C3 
where the elements in C, and C, are non-negative. Original f has gone to f — d where 
dis a known number. Original vector b has changed to the known vector b, where the 
elements in b, are non-negative. We will show that the maximum value of C'X is d 
and the corresponding solution of X is such that the first n - m elements are zeros and 
the last m elements are the last m elements of b,. If the final form is Im B instead of 
B, Iņ then take the last n — m elements in X zeros and the first m as that of b,. Before 
interpreting the form in (5.6.6) let us do a numerical example. 


Example 5.6.2. Redo the problem in Example 5.6.1 by using matrix method and by 
reducing to the format in (5.6.6). 


Solution 5.6.2. Our problem is to maximize 2x, + 3x; subject to the conditions 
X(*2x,x100 and x,+x,<60. 
Writing in matrix notations, we want to maximize 


f-CXz2x 43x, C'-(53, X'z-0qx) 
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subject to the conditions 


sia SE 


Introduce the dummy variables y, and y; to write 


X, * 2X; +y; 2100 
Xi +X t y,- 60 
2x, + 3x; =f. 


Writing the coefficient matrix and the right side we have 


1 2 1 0 100 
A I 
1 0 1 60 | : ? 
is ; 
2 3 0 0 p t * f 


We will denote the elementary row operations with the help of our usual notation. Add 
(-1) times the first row to the second row, (—2) times the first row to the third row. That 
is, 


1 2 1 0 100 

o =i -1 1 -40 
-(1) + (2); -20) + 3) = 

O -1 : -2 0 : f-200 


Now, add (-1) times the second row to the third row, 2 times the second row to the first 
row and then multiply the second row by (-1). That is, 


1 0 fe A2 : 20 
O 1 Ap 1 A i 40 
-(2)+ 3B); 22)+(1); -(2) = 
0 0 epe f - 160 
O : -Œ i f-d 


Writing back in terms of the original variables we have 


Xi - yi + 2y) = 20 
Xj*tyi - y; 7 40 
-yi - y? =f - 160. 
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A feasible solution is x, = 20, x; = 40, y; 20, y; 20, f -160- 0 = f - 160. Thus the 
maximum is attained at x, = 20 and x, = 40 and the maximum value of f is 160. This 
is exactly what we had seen in Example 5.6.1. We may also observe that C}b = 160 = f 
also. 


Note that the form in (5.6.5) is brought to the form in (5.6.6) by elementary opera- 
tions on the left then d is the maximum value for f = C' X. In general, we can show that 
the maximum of f = C'X subject to the conditions AX < b, X > O is also the same as 
the minimum of g = b' Y subject to the condition A'Y > C, Y > O. These two are called 
duals of each other. That is, for m x n matrix A, n x 1 vector X, m x 1 vector b, n x1 
vector C, m x 1 vector Y , with non-negative components in X and Y, the following two 
problems are equivalent: 


Maximize f -C'X subjectto AX «b, X20 
is equivalent to 
minimizeg =b'Y subjectto A'Y 2C, Y20 
and the 
maximum of f = minimum of g. 


The proof of this result as well as other properties of the simplex method or other linear 
programming methods will not be pursued here. We will do one more example on 
linear programming and then look at some non-linear least squares problems before 
concluding this section. 

In the above two examples we could easily achieve the format in (5.6.6) by row 
operations without taking into account any other factor. The problem is not as simple 
as it appears from the above examples. One way of keeping all elements in b, of (5.6.6) 
non-negative, when possible, is to adopt the following procedure. Do not start operat- 
ing with the first row and do not interchange rows to start with. Divide each element in 
b of (5.6.5) by the corresponding elements in the first column of A. Look at the smallest 
of these. If this occurs at the i-th row element of A then operate with the i-th row of 
A first and reduce all elements in the first column of (5.6.5) to zeros. Now look at the 
resulting b and the resulting elements in the second column of A. Repeat the above 
process to reduce the second column elements in the resulting (5.6.5) to zeros, except 
the one we are operating with. Repeat the process with the resulting third, fourth, ..., 
last columns of A. This may produce the elements of b, non-negative. Still this process 
may not guarantee the elements in Cj and Cj of (5.6.6) to be non-negative. Let us do 
an example to verify the above steps. 
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Example 5.6.3. Minimize g = 3y, + 2y, subject to the conditions y, >0, y; > 0 and 


2y4 + y2 28 

yy*tyo25 

y + 2y> >8. 
Solution 5.6.3. In order to bring the problem to the format of (5.6.5) let us consider 
the dual problem: Maximize f = 8x, + 5x; + 8x3 subject to the conditions x, > 0, x, 2 0, 


X4 2 0, and 


2X1 tX; *X4 <3 


Xj t X; t 2X4 € 2. 


Form the sub-matrices as in (5.6.5) to obtain the following: 


25. An pt UE Rs d). i 43 

A: I : : à 
"ode en 7 1 2 : O 1 : 2 

XE E TIE TEE Etc 
3 IE 8 5 8 : 0 0 : f 


Divide each element of the last column by the corresponding element in the first col- 
umn of A. That is, 3 and Z, The smaller one is 3 which occurs at the first row. Hence 
we operate with the first row to start with. Do the following operations to reduce the 
first column elements to zeros: 


1 1 1 3 

5 35 ; 0 5 

1 o 2-3 E 1 

; 0 -(1)+ (2); -8(1) + (3) => A À 7 E 
O 1 4 : 4 0 : f-2 


Divide each element of the last column by the corresponding element in the second 
column of the resulting A. That is, 3 E -3and 1 / 1 = 1. The smaller one occurs at the 
second row and hence we operate with the second row. 


1 O0 -1 1 EX! 1 
DD -22)+3); 22) = © 2 Ani : 
0 0 1 : -3 -2 : f-2B 


Now it appears that a solution is reached because the last column, b,, has non- 
negative elements, —C} is such that Cj has non-negative elements but -C| = 1 and 
hence the conditions are not met yet. Now divide the elements in the last column with 
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the third column elements of the resulting A. The smaller one occurs at the second 
row, and now operate with the second row. 


1 2 1 4 
3 0 3,03 3 
1 o 2 1 2 1 
305 -(2)+ (1) 004«G)2 3 3 3 3 
epo mH nm 
OU Wes dice Ioue 


Now all elements are in the proper order as in (5.6.6) and hence the maximum of f = 


8x, + 5x; + 8x3, our dual problem, occurs at x, = $ X = Oand x; = i and the maximum 
is 


(2) sme) 


The minimum for our starting problem occurs at y, = 3 y, and the minimum value 


3 = 
of g = 3y; + 2y is given by the following: 
Minimum of g = 3(5) + (5) = T - maximum of f. 

The student may start with our problem in Example 5.6.3 and try to reduce to the for- 
mat in (5.6.6) without going through the above procedure and see what happens and 
may also construct examples where even after going through the above procedure a 
solution is not reached. Thus, many more points are involved in solving a linear pro- 
gramming problem. What is given above is only an introductory exposure to matrix 
methods in linear programming problems. 


5.6.2 Nonlinear least squares 


The method of least squares is already discussed in connection with regression and 
other model building problems. The problem is to predict a variable y such as the rain- 
fall in a particular region during a particular month. This amount y depends on vari- 
ous factors such as the wind, pressure, temperature and such atmospheric conditions 
which can vary, and fixed quantities such as the topographic parameters. Let x4, ..., Xm 
be the variables which may have some relevance in predicting y and let the value of y 
that we can expect, E(y), for a preassigned set of values of x,, ...,x,, be denoted by 


Ely) =f Ga, .... X, Ayo «++ > A) (5.6.7) 


where a4, ..., a; are fixed but unknown parameters and x, ...,X,, are observable vari- 
ables. If the function f is linear in the unknowns a4, ... , a, then we have a linear model 
and if f is nonlinear in the parameters then it is called a nonlinear model. What will 
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happen to the method of least squares when f is nonlinear in the unknowns q4, ... , ag? 
This is what we will investigate here. As examples of linear, nonlinear models and the 
corresponding least square problem when there are n data points, consider the fol- 
lowing: 


n 


E(y) =a + bx; +c% 2 min (y, -a-b xi- cX), (a) 
PARE 
n 
E(y) - ab* ^ min 307 -a b*y?, (b) 
ab il 
n 
E(y) =a% = min Y (y; - a e Xite x)? (c) 
a,b,c 4 
ee i=l 
n 
Ely) =a + bx,x%, +06 = min (y; - a - bx - X&Y. (d) 
X i 


In (a) and (d) we have linear models whereas in (b) and (c) the models are nonlinear 
since they are nonlinear functions of the parameters. How to carry out the minimiza- 
tion over the parameters is what we will investigate here. If there are n data points in 
(5.6.7) then we have the following: 


X4 X x 
7 n Xa ml A 
l Xp X x 
Y=|: jh Xx=| P 2 © m|, a=]: 
y: : : 2 d; 
XUL aor ei Kon 


where the quantity to be minimized is 


n 
p= Yi -= fi Gs. axi sod] (5.6.8) 
iz 


The usual approach in minimizing y with respect to the parameters a,,...,a, and 
when f is nonlinear in a;, ...,a; is to reduce the problem into a linear one. Let a’ = 
(ai, ...,ay) be a value of a for which ij is a minimum. Let a + ô, 6’ = (0,,...,0,) bea 
neighborhood of a. Expanding y; around a by using a Taylor series we have 


f (Xap --->Xmi 04 + 0... a + Ôk) = f Xii -<-> Xm Ap -> AK) 
k 
of; 
+ 2; a9 7 Wie say 
ja 77) 
fi =f Gi -Xm 04 -> My). 
Then 


(y) =f + P8 (5.6.9) 


408 —— 5 Some applications of matrices and determinants 


where 
ofi, ofi 9j 
6; f à £3 o2 
6=|: |, fo2| il, P=| : oOo One: 
o, dfa of, 
ôk Ji 9a," a? P On, 
Hence 
n 
$ Ivi- WP = (Y -fo - P6) (Y -fo - P6) = (), say. (5.6.10) 
i-1 


Now (5.6.10) can be looked upon as a linear least squares problem of minimizing ($) 
to estimate 6. Differentiating partially with respect to the vector 6 we have 


<0) =O = P'(Y-f, - P6) = O. 
That is, 
A6é = g (5.6.11) 


where 


n 
of. 
A-P'P, g'-(g...8. g=} 0 pA 
i=1 20; 
The classical Gauss method is to use (5.6.11) for successive iterations. This method is 
also often known as the gradient method. 


5.6.3 Marquardt's method 


Marquardt’s modification to the gradient method is to minimize ($) on the sphere 
ôl? = a constant. Under this minimization the modified equations are the following: 


(A-ADó-g forA»0 (5.6.12) 


where A is an arbitrary constant to be chosen by the algorithm. Various authors have 
suggested modifications to Marquardt's method. One modification is to replace the 
various derivatives by the corresponding finite differences in forming equation (5.6.12). 
Another modification is to consider weighted least squares. Another modification is to 
take the increment vector in the iterations as an appropriate linear combination of 6 
and g. 


Example 5.6.4. Illustrate the steps in (5.6.12) if the problem is to fit the model y = 
a,(a,)* based on the following data: 


x= 012 3 
y= 35 2 2 
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Solution 5.6.4. Let a = (2!) be the point for a = ( 4! ) where the least square minimum 
is obtained. Then according to the notations in (5.6.12), observing that (x,,y,) = (0,3), 
(XY) = (1,5), ..., (X4Y4) = 3,22), we have the following: 


hzua =a, > Dici a =0, 
fp =4,(05)'=a,a, > a =, 2 =, 
f; «a (05) = a5 = A 05, a = 204A, 
f, =a (05) = 03 = A -q, a = 30,05. 


Then 


81 = (3 - a) (1) + (5 - a105)(22) 
+ (12 - a,02)a3 + (22= 2,02)03, 
8 = (3 - a )(0) + (5-a,05)(a4) 
+ (12 — a,a5)(2a,a5) + (22 - a,03)(3a,03). 


The matrices P and A are given by the following: 


1 0 
a a 
Bp au 
a5 2040, 
a3 3003 
ioppo 1«a2«0$ 0$,  a(a, + 203 + 303) 
i ~ | ay (ay + 203 +303), o1 -44024903)]| 
From (5.6.12) we have 


6=(A+AD !g. 


For Gauss’ procedure À = 0 and take a trial vector a, say a(9). Compute g, A and 6g) = 
A 1g for this trial vector. Then aa) = Ao) + (oy is the new trial value of a. The success of 
the iterative procedure depends upon guessing the first trial value of ao) very close to 
the true a because our expansion is valid only in the neighborhood of the true a. For 
applying Marquardt's procedure start with a trial value of A also. Various techniques 
are available for selecting a Ap as well as an a) and making adjustments at each it- 
eration so that a convergence to the true optimal point is reached. Both the gradient 
method and Marquardt's method fail in many of the standard test problems in this 
field. 
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5.6.4 Mathai-Katiyar procedure 


Here the equation for iteration is the following: 
(A +AB)6=g 


where A = P'P and B = g'gI - gg'. Here A is the same matrix as in Gauss’ and Mar- 
quardt's procedure and B is obtained by incorporating the condition that the angle 
between the increment vector 6 and the gradient vector g is zero when a minimum is 
reached, and then proceeding with the minimization as before. For the details and a 
flow-chart for executing the algorithm the interested reader may see [6]. It is shown 
that the algorithm always produce convergence to the optimal points at least in all the 
standard test problems in the field. 


Exercises 5.6 


Solve Exercises 5.6.1—5.6.4 graphically as well as by using matrix methods. Form the 
dual problem and solve by simplex method when a minimization is involved. 


5.6.1. Maximize x, + 2x, subject to the conditions x, > 0, x; > 0 and 


X,-X,24 
X1 +X% <5 
X-X% <2. 


5.6.2. Minimize x, + 2x, subject to the conditions x, » 0, x; > 0 and 


2x t x1 28 
X, +X, 26 


Xi + 2x; 2: 9. 
5.6.3. Maximize 8x, + 15x, subject to the conditions x, > 0, x; > 0 and 


2x; +3X,22 
2x4 + 5x, 2 3. 


5.6.4. Use simplex method to maximize 2x, + x; + 6x; + x, subject to the conditions 
X, 2 0, X 2 0, x3 20, x, 2 0 and 


X,+X34+2xX,<5 
X2+X3 <2 


Xi + 3X5 + X3 + X4 € 4. 
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5.6.5. A carpentry firm manufactures desks, tables and shelves. Each table requires 
one hour of labor, 10 square feet of wood and 4 quarts of varnish. Each desk requires 
3 hours of labor, 35 square feet of wood and 1 quart of varnish. Each shelf needs 
45 minutes of labor, 15 square feet of wood and one quart of varnish. At the firm's 
disposal there are at most 25 hours of labor, at most 350 square feet of wood and at 
most 55 quarts of varnish. Each table produces a profit of $5, each desk $4 and each 
shelf $3. How many of each item be produced so that the firm's profit is a maximum 
from this operation and what is that maximum profit? 


5.6.6. A dealer packages and sells three types of mixtures of nuts. The dealer has 
8kg (kilograms) of cashews, 24 kg of almonds and 36 kg of peanuts. Mixture type 1 
consists of 20% of cashews, 20% almonds and 60% peanuts and brings a profit of 
$2 per kg. Mixture type 2 contains 2096 cashews, 4096 almonds and 4096 peanuts 
and brings a profit of $4 per kg. Mixture type 3 is of 1096 cashews, 3096 almonds 
and 6096 peanuts and brings a profit of $3 per kg. How many kg of each mixture the 
dealer should make to maximize the profit from this operation? What is the maximum 
profit? 


5.6.7. A farmer requires at least 4 000 kg (kilograms) of nitrogen, 2000 kg of phos- 
phoric acid and 2000 kg of potash. The farmer can buy 50 kg bags of three types of 
fertilizers, types 1, 2, 3. Each type 1 bag contains 20 kg nitrogen, 15 kg phosphoric acid 
and 5 kg potash and costs $20 per bag. Each type 2 bag contains 10 kg nitrogen, 20 kg 
phosphoric acid and 25 kg potash and costs $30 per bag. Each type 3 bag contains 
15 kg phosphoric acid and 20 kg potash and no nitrogen and costs $20 per bag. How 
many bags of each type of fertilizer should the farmer buy so that the farmer's cost is 
minimized, what is the minimum cost? 


5.7 Alist of some more problems from physical, engineering and 
social sciences 


In order to do a real problem in one of the engineering areas or physical sciences it re- 
quires the knowledge of the technical terms and a clear understanding of the problem 
itself. This needs alot of discussion and explanations but the reader may not be inter- 
ested to invest that much time into it. Hence we will only indicate some of the problems 
where matrices, determinants and eigenvalues play vital roles in simplifying matters. 


5.7.1 Turbulent flow of a viscous fluid 


Analysis of turbulent flow of a viscous fluid through a pipe requires what is known as 
a dimensional matrix of the following form: 
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F LT t 
P 1200 
D 0 1 00 
V 0 1 0 1 
p 1-40 2 
u 1 2 0 1 


where P = the resistance per unit area, D = diameter of the pipe, V = fluid velocity, p = 
fluid density, u = fluid viscosity, F = force, L = length, T = temperature, t = time. The 
above matrix is formed from the table of dimensions of gas-dynamic quantities. 


5.7.2 Compressible flow of viscous fluids 
When the velocity of gas flow is greater than one-half the speed of sound, or when 


thermal effects are appreciable, compressibility must be accounted for. In this case 
the dimensional matrix for the solution of this problem is the following: 


F L T t 
p 1 -4 0 2 
L 0 1 0 0 
V O 1 0 -1 
u 1 -2 O 1 
g O 1 0 -2 
a O 1 0 -1 
P 1-200 


where g = gravitational constant, a = acoustic velocity and other items remain as in 
the case of turbulent flow of Section 5.7.1. 
5.7.3 Heat loss in a steel rod 


In studying heat loss of a steel rod the dimensional matrix becomes the following, 
again constructed from the table of dimensions of gas-dynamic quantities: 


H L T t 

Q 1 0 0 1 
T-T, 0 O 1 0 
l 0 1 0 0 

d 0 1 0 0 
k, MES NE ES 
ka 1 -1 -1 -1 
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where H - thermal energy, L - length, T - temperature, t - time, d - diameter of the 
rod, l = length of the rod, Q = amount of heat rejected per unit time, T, = tempera- 
ture of steel, T, = temperature of air, k, = thermal conductivity of steel, k, = thermal 
conductivity of air. 


5.7.4 Small oscillations 


Oscillations of mechanical systems or current or voltage in an electrical system 
and many such problems fall into this category. Suppose that the displacement 
from an equilibrium position of the system can be described by the n x 1 vector X, 
X! = (6, ..., X,) so that the equilibrium position is X = O. When the system performs 
small oscillations its kinetic energy T is represented by a quadratic form of the type 


where A = (aj) is a real symmetric positive definite matrix, that is, T > O for all non- 
null X, 


nes (i Ws a) H a 


the derivative of the components in X with respect to the time variable t. The poten- 
tial energy V for small oscillations from the equilibrium position can be shown to be 
represented by the quadratic form 


1 
V = -X' BX 
2 
where B = B' is positive definite or at least positive semi-definite, that is, V > O for all 


non-null X. Equations of motion for the system, with no external forces acting, can be 
shown to give rise to the differential equation 


AX* +CX =O (5.7.1) 
where 
d? 
X* =—~ 
dt? 


If a solution of the type X = ey, u real, i= V-1, is assumed for (5.7.1) then from (5.7.1) 
we obtain the equation (see Section 5.5) 


(-AA * CY 20 (5.7.2) 
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where A = x?. Note that (5.72) is an eigenvalue problem where A will represent the nat- 
ural frequencies of vibration. The largest frequency is then the largest eigenvalue A,,. 
Writing (5.72) in the form 


(AI-A1CO)Y =O (5.733) 


the A’s are the eigenvalues of the matrix A'!C or that of A73 CA^; = G where A? is 

the real symmetric positive definite square root of the real symmetric positive definite 

matrix A. Simultaneous diagonalizations of A and C lead to the determination of the 

normal mode of the system. The Rayleigh quotient in this connection is given by 
Z'CZ 1 


R,(X) = ZAZ Z=A 2X 


and then 


a apr O, 


see Section 5.5. If X; is an eigenvector of G then Z; = A^ 2 X; is a normal mode vector for 
the small vibration problem. If external forces are taken into account then (5.7.1) will 
be modified with a function of t sitting on the right side. We will not elaborate on this 
problem any further. 


5.7.5 Input-output analysis 


Another important set of problems fall in the category of input-output analysis. 
Input-output type situations arise in a wide variety of fields. Let X, X’ = (x,,...,X,) be 
the input variables. Suppose that these go through a process, denoted by a matrix of 
operators M, and the resulting quantity is the output, say, Y, Y’ = (y4, ...,Ym). Then 
the system can be denoted by 


Y = MX. 


If M is a matrix of partial differential operators then the output is a system of differen- 
tial equations. For example let 


then 


sY. 


mT [5 03) — 3; 09) + 53s; id - [s 


ð 0 ð 
OX, 0) +4 OX; (x2) OX; (x3) 4 
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If M isalinear operator, designated by a constant matrix, then the output is the result 
of a linear transformation. For example if k = 4 =n and if 


1-2 3 1 

0 1 -1 2 
Mz 

2 1 1 -1 

2.0 2 3 


then the output is Y - MX where 


Vy =X — 2X_+3X34+Xyq, y27X5—- X3 + 2X4, 


Y3 =2X1 +X2 +X3 -X4 Yq = 2X, + 2x4 + 3X4. 


If x,,Xy,X3 are quantities of 3 items shipped by a firm to two different shops and if M 
represents the per unit sales prices of these items in these shops then the output is the 
vector of revenues from these two shops on these three items. Suppose that the unit 
price matrix is the following: 


m-|5 $2 d 
$2 $2 $2 


Then the output or the revenue vector is given by 


v= irs bonos 


2x, + 2x; + 2X4 


For example, if x, = 10 kilograms (kg), x, = 5 kg, x4 = 20 kg then the revenue vector is 
Y- $60 , 
$70 


If M is a transition probability matrix and if X, is the initial vector (see Chapters 2 
and 5) then the eventual behavior of the system is given by the output vector 


Y = M? X,. 


Several such input-output situations arise in different fields. The input X can be 
in the form of a vector or matrix, the operator M can also be in the form of a vector or 
matrix so that MX is defined then the output will be scalar, vector or matrix as deter- 
mined by MX. Further analysis of such an input-output model will require properties 
of matrices and the nature of the problem in hand. 


6 Matrix series and additional properties of matrices 


6.0 Introduction 


The ideas of sequences, polynomials, series, convergence and so on in scalar variables 
will be generalized to matrix variables in this chapter. We start with some basic proper- 
ties of polynomials and then see what happens if the scalar variable in the polynomial 
is replaced by a square matrix. 


6.1 Matrix polynomials 


Here a *matrix polynomial" does not mean a matrix where the elements are polyno- 
mials in a scalar variable such as 


14x 2x +x? 
2-x+x? xX 


Such a matrix will be called a matrix of polynomials. The term “matrix polynomial” 
will be reserved for the situation where to start with we have a polynomial in a scalar 
variable and we are replacing the scalar variable by a square matrix to obtain a polyno- 
mial in a square matrix. For example, consider a polynomial of degree m in the scalar 
variable x, 


poO0-2agcaxe--caQx", Ay #0 (6.1.1) 
where ao, ..., Am are known constants. For example, 


p,(x) = 44 2x —3x*, a polynomial in x of degree 2; 
P(x) =2+5x, a polynomial in x of degree 1; 


p3(x)=7, a polynomial in x of degree 0. 


Let 
1 0 0 
A-|O 1 1 
1 1 1 


Let us try to construct polynomials p,(A), p; (A), p3(A) in the matrix A, corresponding 
to the scalar polynomials p, (x), p>(x),p3(x) above. When x in (6.1.1) is replaced by the 
matrix A then the constant term a, will be replaced by aol, I the identity matrix. That 
is, 


p(A-ag*a1A---ca,A", a, #0. (6.1.2) 


@ Open Access. © 2017 Arak M. Mathai, Hans J. Haubold, published by De Gruyter. [(c) EYZITSIENMI| This work is licensed 
under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. 
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Thus for our illustrative examples we have 
2 


100 100 100 
p,A-4|0 1 oļ+2ļo 1 1[|-3|O 1 1 

001 11 1 jn Ac d 

3 0 0 

=|-3 0 -4 

-4 -4 0 


which is again a 3 x 3 matrix. The following results are obviously true for matrix poly- 
nomials. 


(i) If px), P), q) =p) + px), q(x) = pi0p(x) are polynomials in the 
scalar x then for any square matrix A, 


P(A) + pA) = 4,(A), py (A)p2(A) = q(A). 
We can note that the factorization properties also go through. 
(ii) If 
p(x) = (x-a)(x - b) 
where x, a, b are scalars, a and b free of x, then for any square matrix A, 
p(A) =(A-al)(A - bI) 
where I is the identity matrix. 


Consider the characteristic polynomial of an n x n matrix A. That is, 
D(A) = |A -AI = (Ay - 0a - A) + An - A) 


where A,,...,A,, are the eigenvalues of A and p(A) = 0 is the characteristic equation. 
Then it is easy to see that p(A) = O. That is, 


(iii) every n x n matrix A satisfies its own characteristic equation or 


(AI -AAI - A) -+ (AI - A) = O. 


6.1.1 Lagrange interpolating polynomial 


Consider the following polynomial where À, ..., A, are distinct quantities free of A and 
a,,...,a, are constants: 
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(A-AA - A) --- (A À4) dun (A-AA - A) = (A- À4) 
EA = Ag) (Ay 723): Ay = Ay)? Ay = Ay) Ag = 3) + Ag = Ay) 
(A- À) ++» (A- À 4) 
i (An -A,) DU (An -Àn) 
^ (A-AÀ) 
= : l 6.1.3 
ME eT 


n 
j=1 


p)-a 


which is a polynomial of degree n -1 in A. Put A =A, in (6.1.3). Then we have p(A,) = a. 
Similarly p(Aj) = aj, j = 1, ..., n. Therefore 


n n (A - À) 
p(A) = pa] l l. (6.1.4) 


The polynomial in (6.1.4) is called Lagrange interpolating polynomial. A more general 
polynomial in this category, allowing multiplicities for A,, ... , À, is Hermite interpolat- 
ing polynomial which we will not discuss here. From (6.1.4) we have, for any square 
matrix A, and p(A) satisfying (6.1.4), 


n 


a (A - AID) 
(A) = a] : l. (6.1.5) 
á dP : Ju (A; - A) 


An interesting application of (6.1.5) is that if A,,...,A,, are the distinct eigenvalues of 
any n xn matrix A and p(A) is any polynomial of the type in (6.1.4) then the matrix p(A) 
has the representation in (6.1.5). Let us do an example to highlight this point. 


Example 6.1.1. Compute e? where 


Solution 6.1.1. The eigenvalues of A are obviously A, = 1, A, = 4. Let p(A) = e? Then 
from (6.1.4) 

A-A, 

A, -A, 


A-A 
pA) =e =D) + p) 
1 792 


e e20 
= À-4 À - 1). 
3 )+ zí ) 


Therefore from (6.1.5) 


e e? 
p(A)= 3 fA 4I) + 3 (A- 1) 


O -3 0. e^ f 0.0 
E CEBIT 
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6.1.2 A spectral decomposition of a matrix 


We will consider the spectral decomposition of a matrix A when the eigenvalues are 
distinct. The results hold when some of the eigenvalues are repeated also. In the re- 
peated case we will need Hermite interpolating polynomials to establish the results. 
When the eigenvalues of A are distinct we have the representation in (6.1.5) where 
p(A) is a polynomial defined on the set of distinct eigenvalues of A (spectrum of A). Let 
(6.1.5) be written as 


D(A)-A + AS. (6.1.6) 
Let us consider the product A,A). Excluding the constant parts, A, and A, are given by 
A, > (A- AJ(A - AD ++ (A - A4I) 
and 
A; > (A-AJD(A - A4 + (A - AI). 


Then 


A,A, > (A -ADA - AJD(A - AY --- (A - ALD?. 
But from property (iii), 
(AI - A)(A,I - A4) --- (AKI - A) 2 O 
and hence 4,4, = O. Similarly A;A; = O for alli and j, i + j. Thus A4, ..., A, are mutually 


orthogonal matrices and hence linearly independent. Taking p(A) = 1in (6.1.6) we have 
the relation 


T=B,+---+B, (6.1.7) 
where 
(AAD) (A- A DA Aja (A LAUD 
!0 (QA): 0j A09 A5) 05 - An) 
and 


BjBj-O foralli#j. 


Then multiply both sides of (6.1.7) by B; we have B; - B? for each j, j 2 1, ...,n. 
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(iv) In the spectral decomposition of an identity matrix, as given in (6.1.7), the Bj's 
are mutually orthogonal and each B; is an idempotent matrix. 


Taking p(A) =A or p(A) = A in (6.1.6) we have the following spectral decomposition 
for A: 


(v) For any n x n matrix A with distinct eigenvalues A,,...,A,,, 
AZ-AJB,*-- AB, (6.1.8) 


where the B;'s are defined in (6.1.7). 


This can be observed from property (iii) and (6.1.7). Note that 
since (A - AI)B; = O by property (iii). Hence 
A,B, +++: +A,B, =A(B, +- + B,) =A 


since B, + --- + B, =I by (6.1.7). We can also notice some more interesting properties 
from (6.1.8): 


BB; = O0 = BjBi, izj 
as well as 
= = VE 
BjA = AB; = A,B} - AjJBj. 


Thus the matrices A, B,,...,B,, commute and hence all can be reduced to diagonal 
forms by a nonsingular matrix Q such that 


D =A,D; +- * AJD, (6.1.9) 


where QAQ™ = D, QB;Q™ = D; for all j, D;D; = O for all i # j. The matrices B}, ...,B, in 
(6.1.8) are also called the idempotents of A, different from idempotent matrices. 


Example 6.1.2. For the matrix A = [13] verify (6.1.7) and (6.1.8). 


Solution 6.1.2. The eigenvalues are A, = 4 and A, = -1. Two eigenvectors correspond- 
ing to A, and A, are 
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Let 
1 3 
= (X,,X>)= ] 
o-axo-[ 2] 
a Ih 3 
2 5|1 -1]’ 
1/1 3][4 o]f2 3 1 3 
DQ"! = - = =A 
aor |e |e |e ae 
pea r2 3 
UA eA s 3p 
idc opea 
^y. 512 -2p 
1[5 O 
B,*B,-- -I 
prog Jh j 
4[2 3] 1[-3 3 
A,B, + AB,» — - 
12, + A202 ar isl E 
Mi >| =A 
2 2 


Example 6.1.3. For the matrix A in Example 6.1.2 compute Q such that Q™!AQ = diag- 
onal. Also establish (6.1.9). 


Solution 6.1.3. By straight multiplication 


fice, ITE: a0 
5 5a-[. J 
and 
ann [O 0 
Q Ba= |o 4 


Taking the linear combination (6.1.9) is established. 


(vi) In the spectral representation of any n x n matrix A with distinct eigenvalues, 
as in (6.1.8), the rank of Bj for each j cannot exceed 1. 


6.1.3 An application in statistics 


In the spectral decomposition of an n x n matrix A, as given in (6.1.8), each B; is idem- 
potent. If A is real symmetric then Bj, j = 1, ... , n are also real symmetric since the eigen- 
values of a real symmetric matrix are real. If the eigenvalues of A are all distinct then 
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each B; is of rank 1. Consider X an n x 1 real Gaussian vector random variable having 
a standard Gaussian distribution. In our notation X ~ N,(O, T) where I is an identity 
matrix. Consider the quadratic form X' AX. Then 


X' AX - AX! BX ++ +A,X'B,X. 


Since B, = Bj = Bj and since X ~ N,(0,I) it follows that X'B;X ~ Xi, that is, X'BjX isa 
real chisquare random variable with one degree of freedom. Since B;B; = O, i + j these 
chisquare random variables are mutually independently distributed. Thus one has a 
representation 


X'AX - y, tot Aan 


where the y,, ...,y, are mutually independently distributed chisquare random vari- 
ables with one degree of freedom each when the A,’s are distinct. One interesting as- 
pect is that in each Bj all the eigenvalues of A are present. 


Exercises 6.1 


6.1.1. If A is symmetrically partitioned to the form 


Ay A 
Cele #2 
h 


then show that for any positive integer n, 


A” = b di ed 


(0) I 
where 
x" -1 
px) = €—D, 
x-1 
6.1.2. Compute e^7^ where 
A- 1 2 
3« 5 
6.1.3. Compute sin A where 
z 2 0 0 
A= ii 4 1 O 
-2 5 -2 


6.1.4. Spectrum ofa matrix A. The spectrum of a matrix is the set of all distinct eigen- 
values of A. If B= QAQ"! and if f (A) is a polynomial defined on the spectrum of A then 
show that 


f(B) = Qf(4)Q !. 
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Prove the result when the eigenvalues are distinct. The result is also true when some 
eigenvalues are repeated. 


6.1.5. If A is a block diagonal matrix, A = diag(A,, A>, ..., Aj), and if f(A) is a polyno- 
mial defined on the spectrum of A then show that 


f(A) = diag(f (A). f (A5)... .f(Aj)). 


6.1.6. If À,,...,À, are the eigenvalues of an n x n matrix A and if f(A) is a poly- 
nomial defined on the spectrum of A then show that the eigenvalues of f(A) are 


FAY) f 03). -f An. 


6.1.7. For any square matrix A show that e^, where k is a nonzero scalar, is a nonsin- 
gular matrix. 


6.1.8. If A is a real symmetric positive definite matrix then show that there exists a 
unique Hermitian matrix B such that A = eP. 


6.1.9. By using the ideas from Exercise 6.1.3, or otherwise, show that for any nx n 
matrix A 


ei^ =cosA+isinA, i= V-1. 


6.1.10. For the matrix A = (29) compute In A, if it exists. 


6.2 Matrix sequences and matrix series 


We will introduce matrix sequences and matrix series and concepts analogous to con- 
vergence of series in scalar variables. A few properties of matrix sequences will be 
considered first. Then we will look at convergence of matrix series and we will also 
introduce a concept called “norm of a matrix", analogous to the concept of “distance” 
in scalar variables, for measuring rate of convergence of a matrix series. 


6.2.1 Matrix sequences 


Let A,,A),... be a sequence of m x n matrices so that the k-th member in this sequence 
of matrices is A,. Let the (i,j)-th element in A; be denoted by a so that A; = (af), 
k - 1,2, .... The elements af? are real or complex numbers. 


Definition 6.2.1 (Convergence of a sequence of matrices). For scalar sequences we 
say that the limit of ays as k — oo, is aij if there exists a finite number aij such 


that ay aj when k — co. Convergence of a matrix sequence is defined through 


element-wise convergence. Thus if al? aj for alli and j when k — oo we say that A, 
converges to A = (aj) as k > oo. 
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Example 6.2.1. Check for the convergence of the sequence A4, A), ... as well as that of 
the sequence B,, B3, ... where 


Solution 6.2.1. Let us check the sequence A,,4,, .... Here 


(1 ( Kk (o 1 O_o 
üq4—-—, a,= , @, =-2+ -, a, =e". 
H 7 2 INE 21 k 22 
é PENES ; k ; k 
lim a = lim — = O, lim a) = lim — = 1, 
k= u k= 2k k= 12 k>œ1+k 


S ; 1 z k ; 
lim af? = lim [-2+ =] --2, lim af? = lim e* - 0. 
k= a koco k koo 2 k—oo 


Hence 


and the sequence is a convergent sequence. Now, consider B,, B», .... Here 


2k 
p? =(-1 k, p® = 0, po = ek, po = : 
Hu =(-1) 12 2 no ay 
Evidently 
: k ; ; k a 2k 
pursue ue pce 


But (-1)* oscillates from -1 to 1 and hence there is no limit as k — oo. Also e* 5 co 
when k co. Hence the sequence B}, B3, ... is divergent. 


(i) For any sequence A}, A>, ... where A, = (a) we say that the sequence is diver- 
1:612 k ij 
gent iffor at least one element in A, either the limit does not exist or the limit is too. 


The following properties are evident from the definition itself. 


(ii) Let A,,45, ... and B}, B, ... be convergent sequences of matrices where A; —^ A 
and B, > B as k 5 oo. Then 


A, + By = A+B, A; By Em AB, 
QAQ! > QAQ, diag(A,,B,) > diag(A, B), 
a, A, — aA 


when a; a, where 24,05, ... is a sequence of scalars. 
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By combining with the ideas of matrix polynomials from Section 6.1 we can establish 
the following properties: Since we have only considered Lagrange interpolating poly- 
nomials in Section 6.1 we will state the results when the eigenvalues of the n x n matrix 
A are distinct. But analogous results are available when some of the eigenvalues are 
repeated also. 


(iii) Let the scalar functions f,(A),f,(A), ... be defined on the spectrum of an nx n 
matrix A and let the sequence A,,A,,... be defined as A; =f, (A), k =1,2,.... Then 
the sequence A}, 4, ... converges, for k — oo, if and only if the scalar sequences 
1503) 203)... GAH), -h -> 0), f (9). ...] converge, as k > co, where 
A,,...,A, are the eigenvalues of A. 


Example 6.2.2. For the matrix A show that 


cost sint Oo 1 
e“ = : , where A= . 
—sint cost -1 0 


Solution 6.2.2. The eigenvalues of A are +i, i= V—1. Take p(A) = e% and apply (6.1.5) 
of Section 6.1. Then 


ata ot lA tiD | i (AiD) 
2i -2i 


CUP TEE OE 
rel 
eit +e-it eit_e-it 
= ( BUT acia ) = 
2i 2 


‘) 


-i 

-1 
cost sint 
-sint cost) 


6.2.2 Matrix series 


A matrix series is obtained by adding up the matrices in a matrix sequence. For exam- 


ple if Ap, A1, A>, ... is a matrix sequence then the corresponding matrix series is given 
by 


f(A)= $ Ax. (6.2.1) 
k=0 


If the matrix series is a power series then we will be considering powers of matrices 
and hence in this case the series will be defined only for n x n matrices. Forannxn 
matrix A consider the series 


co 
g(A) - ag * aA * c a AF +... = 24 a, A* (6.2.2) 
k=0 
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where a9, 4, ... are scalars. This is a matrix power series. As in the case of scalar se- 
ries, convergence of a matrix series will be defined in terms of the convergence of the 
sequence of partial sums. 


Definition 6.2.2 (Convergence of a matrix series). Let f(A) be a matrix series as in 
(6.2.1). Consider the partial sums So, S,,... where 


Sk = Ag +A ++ + Ag. 


If the sequence So, S4, ... is convergent then we say that the series in (6.2.1) is conver- 
gent. [If it is a power series as in (6.2.2) then A, = a, A* and then the above definition 


applies.] 


Example 6.2.3. Check the convergence of the series f; (A) and f,(B) where 
[ve] k -k 
y 2 
f-YAs Ag=| it k_@ 
ico uy CU gg 


and 
f,(B) = >, B, By- [sin kn cos kj, 
k=0 


Solution 6.2.3. The sum of the first m + 1 terms in f,(A) is given by 
m m k m -k 
a ua 
Sm= ) Ax = | m x vm (-D*6* | - 
k=0 i-o K! i-o Qk! 


Convergence of the series in f,(A) depends upon the convergence of the individual 
elements in Sn as m — co. Note that 


Y y-asysy te 


k=0 
-(-y)! ifly«1and +ooify>1; 
ee 
k=0 1-5 
œ Vk 2 
Ba ae Re cg 
Ge ee ar e 
bud (-1)k 92k 02 0^ 
z ++ = COS 0. 
2 (2k)! 2 a 


Hence the series in f,(A) is convergent for |y| < 1 and diverges if y > 1. Now, consider 
f; (B). The partial sums are, for m = 0,1, ..., 
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But X z-o sin ka = 0 for all m whereas Y , cos E oscillates between 0 and 1 and hence 


the sequence of partial sums for this series is not convergent. Thus the series in f;(B) 
is not convergent. 


Example 6.2.4. Check for the convergence ofthe following series in the nxn matrix A: 
f(A)=I+A +4? +. 


Solution 6.2.4. Let A,A, ..., A, be the eigenvalues of A. Let us consider the case when 
the eigenvalues of A are distinct. Then there exists a nonsingular matrix Q such that 


Q AQ =D = diag(A,,...,A,) 
and 
Q7!A"Q = D" = diag(A™,...,A™), m=1,2,.... 
Then 
Q'f(A)Q=1+D+D?4+---. 
The j-th diagonal element on the right is then 
DéARAPAee--A)? iPAL j=... 


which are the eigenvalues of (I - Ay. Then if IA;| <1 for j=1,2,...,n the series is con- 
vergent and the sum is (I — A)! or 


I+A+A?+---=(-A)? forll«Lj-1....n. 


We can also derive the result from (6.1.5) of Section 6.1. The result also holds good 
even if some eigenvalues are repeated. We can state the exponential and trigonometric 
series as follows: For any n x n matrix A, 


OO , 4k q2k+1 OO , 4k A2k 
sinA = COT COSA- Gs. 

&o (2k +1)! fo (2k)! 
o0 2k+1 œ  A2k 

sinh A = d ; A- A ; 
kp (2k + 1)! & (2k)! 
cO Ak 

A 
ef = T (6.2.3) 

&j K! 


and further, when the eigenvalues A,,...,A,, of A are such that IAjl <1,j=1,...,n then 
the binomial and logarithmic series are given by the following: 


co 


oco k-1 Ak 
(-Ay!- Y A5 Inü«4)- Y CITA 
k=0 


(6.2.4) 
k=1 k 
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6.2.3 Matrix hypergeometric series 


A general hypergeometric series ,,F,(-) in a real scalar variable x is defined as follows: 


co 


(a4) “(a ) xt 
pFa (d... agiby bx) = Y ART 


£s (b), (b), r! (6.2.5) 


where, for example, 
(a)y 2 a(a*-1)---(a-m-1, (a)-2L a+0. 
For example, 
oFo( 3 3x) =e, Fola; ix) (17x) * for |x| « 1. 


In (6.2.5) there are p upper parameters a,,...,a, and q lower parameters b;, ..., b. 
The series in (6.2.5) is convergent for all x if q = p, convergent for |x| « 1 if p 2 q 4 1, 
divergent if p > q + 1 and the convergence conditions for x = 1 and x = -1 can also be 
worked out. A matrix series in an n x n matrix A, corresponding to the right side in 
(6.2.5) is obtained by replacing x by A. Thus we may define a hypergeometric series in 
an n x n matrix A as follows: 


as (ai) cias (a Ne A’ 
F (a), ...,ay bi, ..., b, A) = DAROT, 
p qv pri q SN r! 
where a}, ... TT b,,... „bg are scalars. The series on the right in (6.2.6) is convergent 
for all A if q 2 p, convergent for p = q + 1 when the eigenvalues of A are all less than 1 
in absolute value, and divergent when p > q +1. 


(6.2.6) 


Example 6.2.5. If possible, sum up the series 


1+3A+ Sanaa? + (4)(5)A3 + ++} 


where 
1 
2 2 0 
A = 1 = 3 » 
2." Be uu. 


Solution 6.2.5. Consider the scalar series 
14+3x+ slo + (4)(5)x? + --.] 
2 3 
-163x« 07. + D003 to 


-(1-x)? for|x| « 1. 


In our matrix A, the eigenvalues are A, 1 A, -i A3 -i and therefore IAjl « 1, 
j=1,2,3. Hence the series can be summed up into a , Fy type hypergeometric series or 


a binomial series and the sum is then 
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1 o o0]? 
Q-A) =|1 4 0 
3 
2 3 3 
But 
2 0 0 
- 3 3 
0-A)!-|-3 à 0 
1 3 2 
3: "og 
and 
2.35720] 
= -113 3 3 
0-4A)?-[dn-4)'|-|-5 ż 0 
1 .3 2 
3 2 3 
8 0 0 
- 291 27 
sw 0 
4153 . 27 8 
432 96 27 


6.2.4 The norm of a matrix 


Fora 1x1 vector or a scalar quantity a the absolute value, |a|, is a measure of its mag- 
nitude. For an n x 1 vector X, X' = (x, ..., Xs), 


1 
IXI = (ba? +--+ bP}, (6.2.7) 


where |x;| denotes the absolute value of x;, j = 1,...,n, and this can be taken as a mea- 
sure of its magnitude. Equation (6.2.7) is its Euclidean length also. This Euclidean 
length satisfies some interesting properties: 


(a) |X| 20 for all. X and |X| = 0 if and only if X = O (null); 
(b) laX|-la||XI where a is a scalar quantity; 
(c) |X-«Y|xIX|-IYl, the triangular inequality. (6.2.8) 


If (a), (b), (c) are taken as postulates or axioms to define a norm of the vector X, 
denoted by ||X||, then one can see that, not only the Euclidean length but also other 
items satisfy (a), (b), (c). 


Definition 6.2.3 (Norm of a vector and distance between vectors). For X and n x 1 
vector, or an element in a general vector subspace S where a norm can be defined, a 
measure satisfying (a), (b), (c) above will be called a norm of X and it will be denoted 
by ||X||. Note that X replaced by X - Y and satisfying (a), (b), (c) is called a distance 
between X and Y. 
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It is not difficult to show that the following measures are also norms ofthe vector X: 

n 

IX = >) bl; 

j=l 
n 2 1 

IXI = I> P =(X*X)? (the Euclidean norm) 

j=l 
where X* denotes the complex conjugate transpose of X 


1 
n E 
IXllp = b | , p21 (theHólder norms) 
j=l 
|Xl;; =max|x;| (the infinite norm). (6.2.9) 
1sj<n 
Example 6.2.6. Show that ||X||, satisfies the conditions (a), (b), (c) in (6.2.8). 


Solution 6.2.6. |x;| being the absolute value of x; cannot be zero unless x; itself is 
zero. If x; # 0 then |x;| > 0 by definition whether x; is real or complex. Thus condition 
(a) is obviously satisfied. Note that for any two scalars a and x;, |ax;| = |a] |x;|. Hence 
(b) is satisfied. Also for any two scalars x; and y; the triangular inequality holds. Thus 
|X|, satisfies (a), (b), (c) of (6.2.8). 

The following properties are immediate from the definition itself: 
(a) IXI- IYII < IX + YI < IXI + Y1. 
(b) II-XI = IXI. 
(c) If ||X|| is a norm of X then k||X||, k > O is also a norm of X. 
(d) |IXI -IYI < IX — YI. 
(e) IIUl = X] where U = AX, A is a unitary matrix (orthonormal if real). 
(E) [Xl 2 IX 2 -- 2 OX. 


Now let us see how we can define a norm of a matrix as a single number which should 
have the desirable properties (a), (b), (c) of (6.2.8). But there is an added difficulty 
here. If we consider two matrices, an n x n matrix A and an n x 1 matrix X, then AX 
is again an n x 1 matrix which is also an n-vector. Hence any definition that we take 
for the norm of a matrix must be compatible with matrix multiplication. Therefore an 
additional postulate is required. 


Definition 6.2.4 (A norm of a matrix A). A single number, denoted by ||A||, is called a 
norm of the matrix A if it satisfies the following four postulates: 

(a) ||Al| = O and |A|| = O if and only if A is a null matrix. 

(b) IIcAI| = |c| ||Al] when c is a scalar. 

(c) |A 4 B|| x [Al + B] whenever A + B is defined. 

(d) ||ABl| < |All ||Bl| whenever AB is defined. 
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It is not difficult to see that the following quantities qualify to be the norms of the 


matrix A = (aj): 


1 


n P 
lAl) = ( > ay?) , 1sps2 


ij=l 
(Hólder norm, not a norm for p > 2), 
1 
n 2 
All = ( b ay (Euclidean norm), 
ij=l 
All = nmax |i 
All, = max) lal, 
j 


Alls -max) lay, 


l 


A 6 =“ Sp 


where s; is the largest singular value of A; 


where ||AX|| and ||X|| are vector norms, the same norm; 


Alla = AX 
[Alls = max. IAXI 


(6.2.10) 


(6.2.11) 


(6.2.12) 


(6.2.13) 
(6.2.14) 


(6.2.15) 


(6.2.16) 


(6.2.17) 


same vector norm is taken in each case. As a numerical example let us consider the 


following matrix: 

an He 

1 -1 

Then 
15 [0 3| * CO - KCD| - [(-D| = V2-0 1412224 v2 
2=[2+0+1+1]? 22 
3 = 2max[(V2,0,1,1)] = 2V2; 
4, = max[|(1+ i)| + [C2]. |(0)| + |-D]] 2 1 v2; 
Al; = max[|(1+i| + CO]. CD] + D|] =2. 


> > Bb m 


For computing ||A||, we need the eigenvalues of A*A: 


A*A= 1-i 1j||1«i O E 3 E 
0 -1 1 -1 -1 1 


The eigenvalues of A*A are 2 + V2 and then the largest singular value of A is 


[(2+ vJ]? = [Allg 
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Note that there are several possible values for |A||; and ||Allz depending upon which 
vector norm is taken. For example, if we take the Euclidean norm and consider ||Allg 
then it is a matter of maximizing [Y* E subject to the condition X* X = 1 where Y = 
AX. But Y*Y - X* A' AX. The problem reduces to the following: 


Maximize X*A*AX subject to the condition X* X - 1. 


This is already done in Section 5.5 and the answer is the largest eigenvalue of A* A and 
hence, when this particular vector norm is used, 


All = Sı = largest singular value of A. 


Note that for a vector norm |X||, k||X|| is also a vector norm when k > 0. This property 
need not hold for a matrix norm ||A|| due to condition (d) of the definition. 


Example 6.2.7. For ann x n matrix A = (aj) let a= max; ; lay, that is, the largest of 
the absolute values of the elements. Is this a norm of A? 


Solution 6.2.7. Obviously conditions (a), (b), (c) of Definition 6.2.4 are satisfied. Let 
us check condition (d). Let B = (by) and AB = C = (c). Then 


n 

Cy = Y ayby; = 
k=1 

n 


» ag by 


max |c;| = max 
bj b f 


Suppose that the elements are all real and positive and that the largest ones in A and 
B are a, = a and b,, = b. Then 


max|ay|=a, maxibj|-b, max|ayl[max|byl] = ab 
lJ i) lJ I5] 


whereas 
n 


Y ag Dix 
k=l 


Hence condition (d) is evidently violated. Thus a cannot be a norm of the matrix A. It 
is easy to note that B = na is a norm of A, or 


max =ab+6, 620. 


ij 


B=na=n max layj| = lAl. (6.2.18) 


Example 6.2.8. Let u4 = max; |Aj| where A,,...,A, be the eigenvalues of an n x n ma- 
trix A. Evidently u is not a norm of A since condition (a) of Definition 6.2.4 is not sat- 
isfied by u. [Take a non-null triangular matrix with the diagonal elements zeros. Then 
all eigenvalues are zeros.] Show that for any matrix norm [Al|, 


|All 2 p4. (6.2.19) 


This y; is called the spectral radius of the matrix A. 
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Solution 6.2.8. Let A, be the eigenvalue of A such that u4 = A,. Then, by definition, 
there exists a non-null vector X such that 


AX, - AJX,. 
Consider the n x n matrix 
Bz(X,,0,...,0). 
Then 
AB = (AX,,0, ...,0) = (A,X;, O,...,0) = AB. 
From conditions (a) and (d) of Definition 6.2.4 
Ail BII < AI BIL = IAI > 1 


since |B|| # O due to the fact that X, is non-null. This establishes the result. The result 
in (6.2.19) is a very important result which establishes a lower bound for norms of a 
matrix, whatever be the norm of a matrix. 


6.2.5 Compatible norms 


For any nxn matrix A and nx 1 vector X if we take any matrix norm ||A|| and any vector 
norm |X|| then condition (d) of the definition, namely, 


IAX| < IAI XII (6.2.20) 
need not be satisfied. 


Definition 6.2.5. For any matrix A and any vector X, where AX is defined, if (6.2.20) 
is satisfied for a particular norm ||A|| of A and ||X|| of X then ||A|| and |X|| are called 
compatible norms. 


It is not difficult to show that the following are compatible norms: 


Matrix norm Vector norm 
|All; of (6.2.13) IX loo of (6.2.9) 
|All, of (6.2.14) IX], of (6.2.9) 
Alle of (6.2.15) IX], of (6.2.9) 
|All; with any vector norm ||X||, IXI, 
|Allg with any vector norm |X| |X| 


Example 6.2.9. Show that ||All, of (6.2.13) and |X||,, of (6.2.9) are compatible norms. 
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Solution 6.2.9. Let X be an n x 1 vector with ||X||,, = 1. Consider the vector norm 
n 


> aijXj 


j=1 


n 
< max lal 1% < IXlloo Alla 


|AX||,, = max 
i jal 


which establishes the compatibility. 


6.2.6 Matrix power series and rate of convergence 


Let A be an n x n matrix and consider the power series 
f(A)=I+A+A? +e. (6.2.21) 


We have already seen that the power series in (6.2.21) is convergent when all the eigen- 
values of A are less than 1 in absolute value, that is, 0 < |Aj| < 1, j 2 1, ...,n where the 
Aj's are the eigenvalues of A. If |A|| denotes a norm of A then evidently 


lA] = IAA --- AI < IA. 
Then from (6.2.21) we have 


|M +A +4? + -el< 1+ Al +All? 8 
1 
1- |All 


if IA] <1. 


Therefore if the power series in (6.2.21) is approximated by taking the first k terms, that 
is, 


f(A)=T+At-e +A} (6.2.22) 
then the error in this approximation is given by 
AF x AMS... SANT + A+ A240] S 


k 
JAK + AF? + -l| < |All 
1- |All 


if |A|| < 1. (6.2.23) 


Thus a measure of an upper bound for the error in the approximation in (6.2.22) is 
given by (6.2.23). 


6.2.7 Anapplication in statistics 


In the field of design of experiments and analysis of variance, connected with two- 
way layouts with multiple observations per cell, the analysis of the data becomes quite 
complicated when the cell frequencies are unequal. Such a situation can arise, for ex- 
ample, in a simple randomized block experiment with replicates (the experiment is 
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repeated a number of times under identical conditions). If some of the observations 
are missing in some of the replicates then in the final two-way layout (blocks versus 
treatments) the cell frequencies will be unequal. In such a situation, in order to esti- 
mate the treatment effects or block effects (main effects) one has to solve a singular 
system of a matrix equation of the following type: (This arises from the least square 
analysis.) 


0-A)à-Q (6.2.24) 


where a! = (a;,... T are the block effects to be estimated, à denotes the estimated 
value, A is a p x p matrix 


and Q is a known column vector. The matrix A is the incidence matrix of this design. 
From the design itself a;'s satisfy the condition 


ETT =O, (6.2.25) 


p 


Observe that A is asingular matrix (the sum ofthe elements in each row is 1). Obviously 
we cannot write and expand 


à-ü-AY!Q- [I«A* 4^ «--.]Q 


due to the singularity of A. Let k, ... sky be the medians of the elements in the first, 
second, ..., p-th rows of A and consider a matrix B = (by), bj = (ay — kj) for all i and j. 
Evidently (I - B) is nonsingular. Consider 


(I - B)â = (I -A-K)à-(I- A)à  Kà 


where K is a matrix in which all the elements in the i-th row are equal to kj, i= 1, ... , p. 
Then with (6.2.25) we have Ka = O and hence 


(I-A)a=(I-B)#=Q = 
&=(I-B)'Q=(I1+B+B?+--)Q. 


Take the norm ||B||, of (6.2.13). That is, 
p 
IBI, = uU Ibi; — kil. 


Since the mean deviation is least when the deviations are taken from the median ||B||, 
is the least possible for the incidence matrix A so that the convergence of the series 
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I +B +B? +- is made the fastest possible. In fact, for all practical purposes of testing 
statistical hypotheses on a;'s a good approximation is available by taking 


à « (I B)Q 


where inversion or taking powers of B is not necessary. For an application of the above 
procedure to a specific problem in testing of statistical hypothesis see [1]. 


Exercises 6.2 


6.2.1. If p(A) is a polynomial defined on the spectrum of an n x n matrix A then show 
that p(A’) = [p(A)]’. 


6.2.2. For any n x n matrix A show that there exists a skew symmetric matrix B such 
that A = e? if and only if A is a real orthogonal matrix with its determinant 1. 


1 


1 
6.2.3. For the matrix A = | 52 | sum up the following matrix series, if possible: 
272 


I+A+A? +e, 
6.2.4. For the same matrix in Exercise 6.2.3 sum up the series 
1+2A+3A7+4A3+---. 


6.2.5. For the same matrix in Exercise 6.2.3 sum up the series 


1, 34? (6). 
I+-A+ + + 
2 4 2! 8 3! 
6.2.6. Show that the norms |X|, and |X| in (6.2.9) satisfy all the conditions in 


(6.2.8). 


6.2.7. Prove that the norm defined in (6.2.17) is a matrix norm, and from there prove 
that (6.2.16) is also a matrix norm. 


6.2.8. For any n x n matrix A consider the Euclidean matrix norm ||A||, of (6.2.11). Let 
Ài» ... , À, be the eigenvalues of A and let R(A;) = real part of A; and J(Aj) = imaginary 
part of Aj. Then show that 


n 

Y. < IAI? 

i-1 
Z 1 
DIRA)P sIBI, B=5(4+4*) 
i-1 
E 
YO) slc, c- 54-47) 
i-1 


and that the equality in any one of these implies equality in all the three above and 
equality occurs if and only if A is a normal matrix. 
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6.2.9. Forany nx n matrix A - (aj) let À be any eigenvalue of A. Then show that 
Jano, |R(A)|<no, |30)| ny 
where R(-) and J(-) denote the real part and the imaginary part of (-) respectively, and 


p = max |a;jl, 


o = max |bjl, B= (by) = (A+A*), 
LJ 


1 
2 
1 
-(A-A*). 
Sa-a) 


y =max |c, C= (cy) = 


6.2.10. For any nx n matrix A = (aj) and for any eigenvalue A of A show that 


|J(A)| zayn(n-1)2, a= 5 max la; — aj]. 
ij 


6.2.11. For any n x n matrix A let 


B- AE , 
A* I 
Show that B is positive definite if and only if ||All, < 1 where ||All, is the norm defined 


in (6.2.15). 


6.2.12. If Bis positive definite and A is positive semi-definite then show that the eigen- 
values Aj's of (A + B)'A are such that 0 < Aj <1. 


6.2.13. For an arbitrary nx n matrix A let B=[_9. 4], i= V-1, then show that |Bll; = 
|All, see equation (6.2.15) for the norms. 


6.2.14. For an arbitrary n x n matrix A show that |A| < min(|Al/;, |All2), see equations 
(6.2.13) and (6.2.14) for the norms. 


6.3 Singular value decomposition of a matrix 


For the sake of readers who are interested in further results on matrices, a few more 
technical terms will be listed here. Recalling our standard notations, let A' be the 
transpose and A* the conjugate transpose of a matrix A. Two matrices A and B are 
said to be similar if there exists a nonsingular matrix Q such that 


A = QBQ"f. 


If Q is an n x n orthonormal matrix then QQ’ = I, Q'Q =I thereby Q^! = Q'. If Q is 
unitary, that is, QQ* =I, Q* Q = I then Q^! = Q*. If A and Bare such that 


A = UBU* 
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for a unitary matrix U then A and B are said to be unitarily similar. If a square matrix 
is unitarily similar to a diagonal matrix, that is, 


A = UDU* 
where U is unitary and D is diagonal, then A is called a normal matrix. If 
A = UDU' 


where U is an orthonormal matrix, UU’ =I, U'U =I, and D is a diagonal matrix then 
A is called an orthogonally similar matrix. If there exists a nonsingular matrix P such 
that 


A = PBP* 


then A and B are said to be congruent. Then if P is unitary then A and B are unitarily 
similar also. It is not difficult to establish the following results: 


(i) An n x n matrix A is normal if and only if A* is normal. If A is normal then A? is 
also normal for any positive integer p or for any integer p if |A| + 0. 

(ii) Any square matrix is unitarily similar to an upper triangular matrix. 

(iii) An n x n matrix A is normal if and only if AA* = A*A. 

(iv) A normal matrix A is Hermitian if and only if its spectrum lies on the real line 
(eigenvalues are real). 

(v) A matrix A is normal if and only if its real part and imaginary part commute, 
that is, when A = A, + iA}, i= V-1, A), A; real matrices, then 4,4; = AA). 

(vi) A real symmetric matrix A is positive definite (or positive semi-definite) if and 
only if it has a positive definite (or positive semi-definite) square root B, that is, 
A = B?, and further, rank(A) = rank(B). 


Definition 6.3.1 (Singular values). Consider an arbitrary m x n rectangular matrix A. 
Then A*A is n x n and A*A is nonnegative definite (positive definite or positive semi- 
definite). Then there exists a nonnegative square root B such that A* A = B?. The eigen- 
values s,, ...,s, of B = (A*A) 3 are called the singular values of the rectangular matrix A. 
Thus the singular values are the eigenvalues of (A* A) thereby they are nonnegative 
real numbers. 


Example 6.3.1. Evaluate the singular values of the matrix 
A= 1 1 1 l 
100 
Solution 6.3.1. Since A is real, A*A = A’ A. That is, 


RA 1 1 1 
A'A-|1 0 = 
100 
1 0 
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The characteristic equation |A'A — AI| = 0 gives 
A(A? - 4A +2) =0. 
The solutions are A, = 2+ V2, A, = 2- V2, A; = 0. Hence the singular values of A are 
$,=(2+V2)2, s)=(2-V2)2, s,-0. 


Let us see the eigenvalues of AA*. 


atenha [o T Ur ae ee 
100 


The eigenvalues of AA’ are A, = 2 + V2, and A, = 2- v2. Thus the nonzero eigenvalues 
of AA* and A*A coincide. This, in fact, is a general result. 


(vii) For any rectangular m x n matrix A the nonzero eigenvalues of A*A and AA* 
coincide. 


As a practical procedure, consider the eigenvalues of AA* if A is mx n with m x n or 
the eigenvalues of A* A if n x m so that the square roots of these nonzero eigenvalues 
provide all the nonzero singular values of A. If there are r such nonzero singular values 
then the remaining singular values are zeros and there are n -r such zeros if the matrix 
Aismxn. 


6.3.1 Asingular value decomposition 


A very interesting representation of an arbitrary m x n matrix A is a decomposition 
in terms of the singular values of A. There exist an m x m unitary matrix U, U*U =I, 
UU* = I (orthonormal if real), and an n x n unitary matrix V, V*V =I, VV* = I (or- 
thonormal if real), such that 


A- UDV* (6.3.1) 


with D an m x n matrix having s,,...,s, at the leading diagonal positions and zeros 
elsewhere, where s,, ...,s, are the nonzero singular values of A with r denoting the 
rank of A. The representation in (6.3.1) is known as the singular value decomposition 
of A. 

It is not difficult to prove the result in (6.3.1). Let A be an m x n matrix. Since A*A 
and AA* are both real and Hermitian symmetric we can always construct an orthonor- 
mal system of eigenvectors for each. Let X,,...,X, and Y,,..., Ym be orthonormal sys- 
tems of eigenvectors of A*A and AA* respectively. Let A; be a nonzero eigenvalue of 
A* A corresponding to X;. Then 
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A* AX; =/;X; > Xř A*A =/,X; 
> Xi A* AX; =À; 


where À; > 0 since A* A is at least Hermitian positive semi-definite. But 


X: A* AX; = (AX)* (AX) = IAX I? > WAX I= fA 


where |[AX;|| is the Euclidean length of the vector AX;. Let 
1 


Y;= 1 AX; > AA*Y;- — — (AA*)AX; 
AX; IAX;l 
AAA a AX y, 
|AX;ll |AX;ll 


Therefore (AA*)Y; = A,Y;. Thus Y; is an eigenvector of AA* corresponding to the same 
eigenvalue A;, and from the starting point above, 


AX, = |AXY; = NY; = s;Y, (6.3.2) 
where s; is the i-th singular value of A. Now, let 
U-(Y,..,Y,) and V-(X,....X,). 
Then from (6.3.2) 
AV = (sY4, s, Y, ...,5, Y, O, ...,0) = UD (6.3.3) 
where 
U z (Y, ..., Vn) 


and D is an m x n matrix with the leading diagonal positions having s,,...,s, with r 
being the rank of A. Postmultiply (6.3.3) with V* to obtain 


A-UDV* 
and the result is established. 


Example 6.3.2. Obtain the singular value decomposition of the matrix 
A- 10 -1 
O 1 1 


Solution 6.3.2. Since A is 2 x 3 we consider the matrix AA* = AA’: 


1 0 
dd 1€ 7j 
(da 2] 
1 


AA' = 
O 1 1 
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The eigenvalues of AA* are evidently A, = 3, A, = 1 and hence the nonzero singular 
values of A are s, = V3, s; - 1. 


b 1 0 -1 
A*A=A'A=|0 1 
O 1 1 
-1 1 
1 0 -1 
= 1 1 
-1 1 2 


The singular values of A are therefore s, = V3, s; = 1, s, = 0. Let us compute the eigen- 
vectors of A*A = A'A: 


(A'A-AI)Z 20, A23, => Zl =(1,-1,-2). 
1 


Z 1 


X= = 
WZ v6 \ _, 


Corresponding to A, = 1 and A; = 0 we have the normalized eigenvectors of A’A given 
by 


=. 


These are the normalized eigenvectors of A*A and forming an orthonormal system. 
Then 


1 
x 
V= (Xi Xz, X3) > V6 


o l-l- 
l-g- 


Consider AX,, AX), AX;: 


Therefore 


nx = Se G) 
1 AX v2\-17?  ? JAX 2M 
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and hence 
Sh. «ily 
V2 v2 
and 
A=UDV* 
doo 2d 12 
1 1 
[8 gesafs se 
mice ule V v2 
yb allo t RI. M. a 
v3 v3 v3 


Note that equation (6.3.1) provides a way of defining unitary equivalence of rectangular 
matrices. Two m x n matrices A and B are said to be unitarily equivalent if there exist 
unitary matrices U and V such that 


A = UBV* (6.3.4) 


and the matrix D in (6.3.1) is called the canonical form of the rectangular matrix A. It 
is not difficult to establish the following result: 


(viii) Two m x n matrices are unitarily equivalent if and only if they have the same 
singular values. 


6.3.2 Canonical form of a bilinear form 


One interesting application of (6.3.1) is the reduction of a bilinear form in real or com- 
plex vectors. Let X bean mx1and Y beannx1 vectors and A an mxn matrix. Consider 
the bilinear form 


a=X*AY (6.3.5) 


where a is linear in X as well asin Y, A= (aj) is free of X and Y. Consider the singular 

value decomposition of A as given in (6.3.1). Then 

S 0 

a=X*UDV*Y, D= 

lo o) 

where S = diag(sj,55,...,S,), with s,,...,s, being the nonzero singular values of A, r 

indicating the rank of A. Consider the unitary transformations (orthogonal transfor- 
mations when U and V are orthonormal) 


ü Wi 
X*U=T*= : and V*Y-W- 
tm Wn 
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Then 
a=T*DW =s,tiw, +- +s, tw, (6.3.6) 


This form in (6.3.6) is the canonical form of the bilinear form a. Several applications 
of bilinear forms may be found in [8]. 


Exercises 6.3 


6.3.1. Construct a 3 x 3 nonsymmetric matrix A with positive eigenvalues for which 
there exists a non-null 3 x 1 vector X such that (1) X'AX = 0, (2) X' AX < 0, thereby 
showing that definiteness of a matrix cannot be associated with nonsymmetric or non- 
Hermitian matrices. 


6.3.2. For any rectangular matrix A show that AA* and A* A are nonnegative definite 
(positive definite or positive semi-definite), where A* denotes the conjugate transpose 
of A. 


6.3.3. If A is a positive definite matrix then show that there exists a unique lower tri- 
angular matrix T with positive diagonal elements such that A = TT*. [This is known 
as the Cholesky factorization of A.] 


6.3.4. Show that any n x n Hermitian matrix A is congruent to the matrix 


L O O0 
D-|O -L, 0 
O O On. 


where s is the number of nonzero eigenvalues and r is the number of positive eigen- 
values of A. 


6.3.5. Show that a Hermitian matrix A is positive definite if and only if it is congruent 
to the identity matrix. 


6.3.6. Show that two n x n Hermitian matrices A and B are congruent if and only if 
rank(A) = rank(B) and the number of positive eigenvalues of both matrices is the same. 


6.3.7. Compute the singular values of the following matrices: 


1 1|, C=[1,1,-1]. 
0 -1 
6.3.8. Show that the singular values of square matrices are invariant under unitary 


transformations. 


6.3.9. Obtain the singular value decompositions of the matrices in Exercise 6.3.7. 
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6.3.10. Show that two m x n matrices A and B are unitarily equivalent if and only if 
the matrices A* A and B" B are similar. 


6.3.11. For any n x n matrix A show that there exists a unitary matrix U and an up- 
per triangular matrix T whose diagonal elements are the eigenvalues of A, such that 
U*AU =T. 


6.3.12. Forn x n matrices A and B where A is positive definite and B is positive semi- 
definite show that there exists a nonsingular matrix Q such that 


A-QQ' and B-QDQ' 


where D is a diagonal matrix. 


6.3.13. Let A be positive definite and B positive semi-definite, where A + B is defined. 
Then show that |A + B| > |A| and equality if and only if B = O. 


6.3.14. Let A and B be positive definite matrices then show that A — B is positive defi- 
nite if and only if B^! — A^! is positive definite. 


6.3.15. If A and B are positive definite and A - B is positive semi-definite then show 
that |A| 2 |B| with equality if A = B. 


6.3.16. If A is positive definite and I — A is positive semi-definite with |A| = 1 then show 
that A - I. 


6.3.17. If A is positive definite then show that A + A^! - 2I is positive semi-definite. 
6.3.18. Show that (I - AB) ! = I + A(I - BAY A whenever the inverses exist. 


6.3.19. Show that 
(al - A) ! - (BI - A) ! = (B - a)(BI - A) "(al - A)! 


whenever the inverses exist, where a and £ are scalars. 


6.3.20. Let A be a positive definite matrix and a a positive scalar then show that 
|B| x a|A| where 


A b 
B- S Ae bavector. 
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